3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages

All of lore.kernel.org
 help / color / mirror / Atom feed

* 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-08-09 17:18 ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-09 17:18 UTC (permalink / raw)
  To: e1000-devel; +Cc: linux-pci, linux-kernel

When I enable VFs via sysfs on an Intel X540-AT, I see an endless stream of

    ixgbevf 0000:08:10.2: Last Request of type 03 to PF Nacked

messages.  This on an HP z420 with the Intel X540-AT in external Magma
PCIe expansion chassis.  No cable is attached to the X540-AT.

ixgbe is built as a module and is auto-loaded during boot, with no VFs
enabled.  The "Last request Nacked" messages start when I enable VFs
with:

    # echo -n 8 > /sys/bus/pci/devices/0000:08:00.0/sriov_numvfs
    ixgbe 0000:08:00.0 eth1: SR-IOV enabled with 8 VFs
    pci 0000:08:10.0: [8086:1515] type 00 class 0x020000
    pci 0000:08:10.2: [8086:1515] type 00 class 0x020000
    ...
    ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network
Driver - version 2.7.12-k
    ...
    ixgbevf 0000:08:10.2: Last Request of type 03 to PF Nacked
    ...

This happens with v3.11-rc4, v3.10, and v3.9, which is as far back as
I checked.  Complete console log and lspci output are here:

    http://helgaas.com/linux/ixgbe/z420.log
    http://helgaas.com/linux/ixgbe/lspci

^ permalink raw reply	[flat|nested] 28+ messages in thread

* 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-08-09 17:18 ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-09 17:18 UTC (permalink / raw)
  To: e1000-devel; +Cc: linux-pci, linux-kernel

When I enable VFs via sysfs on an Intel X540-AT, I see an endless stream of

    ixgbevf 0000:08:10.2: Last Request of type 03 to PF Nacked

messages.  This on an HP z420 with the Intel X540-AT in external Magma
PCIe expansion chassis.  No cable is attached to the X540-AT.

ixgbe is built as a module and is auto-loaded during boot, with no VFs
enabled.  The "Last request Nacked" messages start when I enable VFs
with:

    # echo -n 8 > /sys/bus/pci/devices/0000:08:00.0/sriov_numvfs
    ixgbe 0000:08:00.0 eth1: SR-IOV enabled with 8 VFs
    pci 0000:08:10.0: [8086:1515] type 00 class 0x020000
    pci 0000:08:10.2: [8086:1515] type 00 class 0x020000
    ...
    ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network
Driver - version 2.7.12-k
    ...
    ixgbevf 0000:08:10.2: Last Request of type 03 to PF Nacked
    ...

This happens with v3.11-rc4, v3.10, and v3.9, which is as far back as
I checked.  Complete console log and lspci output are here:

    http://helgaas.com/linux/ixgbe/z420.log
    http://helgaas.com/linux/ixgbe/lspci

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
  2013-08-09 17:18 ` Bjorn Helgaas
@ 2013-08-13 21:54   ` Skidmore, Donald C
  -1 siblings, 0 replies; 28+ messages in thread
From: Skidmore, Donald C @ 2013-08-13 21:54 UTC (permalink / raw)
  To: Bjorn Helgaas, e1000-devel; +Cc: linux-pci, linux-kernel

Hey Bjorn,

Sorry for the slow reply I was on vacation last week and have been playing catch up on my email.  

We were unable to recreate your failure here locally so I have some additional questions.  First off you mentioned it was failing as far back as v3.9, was it ever working for you?  If so bisecting would be really helpful as I mentioned we have been unable to cause the failure in house.  If not could you see if the problem still occurs without the external Magma PCIe expansion chassis, this is of course assuming that you can plug the X540 into your system without it.

One other note we are in the process of testing several patch (in house) that touch on the code you are seeing the error in.  I can't say they would have any effect on what you are seeing, since I'm not sure what that is, but may be worth testing once they get upstream.

Thanks,
-Don Skidmore <donald.c.skidmore@intel.com>
	
> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Friday, August 09, 2013 10:19 AM
> To: e1000-devel@lists.sourceforge.net
> Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to
> PF Nacked" messages
> 
> When I enable VFs via sysfs on an Intel X540-AT, I see an endless stream of
> 
>     ixgbevf 0000:08:10.2: Last Request of type 03 to PF Nacked
> 
> messages.  This on an HP z420 with the Intel X540-AT in external Magma PCIe
> expansion chassis.  No cable is attached to the X540-AT.
> 
> ixgbe is built as a module and is auto-loaded during boot, with no VFs
> enabled.  The "Last request Nacked" messages start when I enable VFs
> with:
> 
>     # echo -n 8 > /sys/bus/pci/devices/0000:08:00.0/sriov_numvfs
>     ixgbe 0000:08:00.0 eth1: SR-IOV enabled with 8 VFs
>     pci 0000:08:10.0: [8086:1515] type 00 class 0x020000
>     pci 0000:08:10.2: [8086:1515] type 00 class 0x020000
>     ...
>     ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver -
> version 2.7.12-k
>     ...
>     ixgbevf 0000:08:10.2: Last Request of type 03 to PF Nacked
>     ...
> 
> This happens with v3.11-rc4, v3.10, and v3.9, which is as far back as I checked.
> Complete console log and lspci output are here:
> 
>     http://helgaas.com/linux/ixgbe/z420.log
>     http://helgaas.com/linux/ixgbe/lspci
> 
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clk
> trk
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit
> http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-08-13 21:54   ` Skidmore, Donald C
  0 siblings, 0 replies; 28+ messages in thread
From: Skidmore, Donald C @ 2013-08-13 21:54 UTC (permalink / raw)
  To: Bjorn Helgaas, e1000-devel; +Cc: linux-pci, linux-kernel

Hey Bjorn,

Sorry for the slow reply I was on vacation last week and have been playing catch up on my email.  

We were unable to recreate your failure here locally so I have some additional questions.  First off you mentioned it was failing as far back as v3.9, was it ever working for you?  If so bisecting would be really helpful as I mentioned we have been unable to cause the failure in house.  If not could you see if the problem still occurs without the external Magma PCIe expansion chassis, this is of course assuming that you can plug the X540 into your system without it.

One other note we are in the process of testing several patch (in house) that touch on the code you are seeing the error in.  I can't say they would have any effect on what you are seeing, since I'm not sure what that is, but may be worth testing once they get upstream.

Thanks,
-Don Skidmore <donald.c.skidmore@intel.com>
	
> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Friday, August 09, 2013 10:19 AM
> To: e1000-devel@lists.sourceforge.net
> Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to
> PF Nacked" messages
> 
> When I enable VFs via sysfs on an Intel X540-AT, I see an endless stream of
> 
>     ixgbevf 0000:08:10.2: Last Request of type 03 to PF Nacked
> 
> messages.  This on an HP z420 with the Intel X540-AT in external Magma PCIe
> expansion chassis.  No cable is attached to the X540-AT.
> 
> ixgbe is built as a module and is auto-loaded during boot, with no VFs
> enabled.  The "Last request Nacked" messages start when I enable VFs
> with:
> 
>     # echo -n 8 > /sys/bus/pci/devices/0000:08:00.0/sriov_numvfs
>     ixgbe 0000:08:00.0 eth1: SR-IOV enabled with 8 VFs
>     pci 0000:08:10.0: [8086:1515] type 00 class 0x020000
>     pci 0000:08:10.2: [8086:1515] type 00 class 0x020000
>     ...
>     ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver -
> version 2.7.12-k
>     ...
>     ixgbevf 0000:08:10.2: Last Request of type 03 to PF Nacked
>     ...
> 
> This happens with v3.11-rc4, v3.10, and v3.9, which is as far back as I checked.
> Complete console log and lspci output are here:
> 
>     http://helgaas.com/linux/ixgbe/z420.log
>     http://helgaas.com/linux/ixgbe/lspci
> 
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite!
> It's a free troubleshooting tool designed for production.
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clk
> trk
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit
> http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
  2013-08-13 21:54   ` Skidmore, Donald C
@ 2013-08-14  2:23     ` Bjorn Helgaas
  -1 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-14  2:23 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel

On Tue, Aug 13, 2013 at 3:54 PM, Skidmore, Donald C
<donald.c.skidmore@intel.com> wrote:

> We were unable to recreate your failure here locally so I have some additional questions.  First off you mentioned it was failing as far back as v3.9, was it ever working for you?  If so bisecting would be really helpful as I mentioned we have been unable to cause the failure in house.

I'm not aware of any working version.  I'm exercising in the sysfs
SR-IOV configuration, which I think appeared in v3.8 or so.

>  If not could you see if the problem still occurs without the external Magma PCIe expansion chassis, this is of course assuming that you can plug the X540 into your system without it.

I played with this a little more and found this:

1) Magma card in z420, connected to chassis containing X540: fails
(original report)
2) X540 in z420, Magma card in z420, connected to empty chassis: fails
3) X540 in z420, Magma card in z420 but no cable to chassis: works

The only difference I've noticed so far between configs 2 & 3 are
different bus numbers and different IRQ assignments:

Config 2 (failing):
  pci 0000:0c:00.0: [8086:1528] type 00 class 0x020000
  pci 0000:0c:00.0: reg 0x10: [mem 0xdac00000-0xdadfffff 64bit pref]
  ixgbe 0000:0c:00.0: irq 82 for MSI/MSI-X
  IRQ 79: 79
  IRQ 80: eth0
  IRQ 81: snd_hda_intel
  IRQ: 82-93 eth1-TxRx-0 through eth1-TxRx-11
  IRQ 94: eth1

Config 3 (working):
  pci 0000:04:00.0: [8086:1528] type 00 class 0x020000
  pci 0000:04:00.0: reg 0x10: [mem 0xdac00000-0xdadfffff 64bit pref]
  ixgbe 0000:04:00.0: irq 75 for MSI/MSI-X
  IRQ 72: ahci
  IRQ 73: eth0
  IRQ 74: snd_hda_intel
  IRQ 75-86: eth1-TxRx-0 through eth1-TxRx-11
  IRQ 87: eth1

I'll try to narrow this down a little more; I'm just giving you this
preliminary info in case it rings any bells for you.

>> -----Original Message-----
>> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> Sent: Friday, August 09, 2013 10:19 AM
>> To: e1000-devel@lists.sourceforge.net
>> Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to
>> PF Nacked" messages
>>
>> When I enable VFs via sysfs on an Intel X540-AT, I see an endless stream of
>>
>>     ixgbevf 0000:08:10.2: Last Request of type 03 to PF Nacked
>>
>> messages.  This on an HP z420 with the Intel X540-AT in external Magma PCIe
>> expansion chassis.  No cable is attached to the X540-AT.
>>
>> ixgbe is built as a module and is auto-loaded during boot, with no VFs
>> enabled.  The "Last request Nacked" messages start when I enable VFs
>> with:
>>
>>     # echo -n 8 > /sys/bus/pci/devices/0000:08:00.0/sriov_numvfs
>>     ixgbe 0000:08:00.0 eth1: SR-IOV enabled with 8 VFs
>>     pci 0000:08:10.0: [8086:1515] type 00 class 0x020000
>>     pci 0000:08:10.2: [8086:1515] type 00 class 0x020000
>>     ...
>>     ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver -
>> version 2.7.12-k
>>     ...
>>     ixgbevf 0000:08:10.2: Last Request of type 03 to PF Nacked
>>     ...
>>
>> This happens with v3.11-rc4, v3.10, and v3.9, which is as far back as I checked.
>> Complete console log and lspci output are here:
>>
>>     http://helgaas.com/linux/ixgbe/z420.log
>>     http://helgaas.com/linux/ixgbe/lspci
>>
>> ------------------------------------------------------------------------------
>> Get 100% visibility into Java/.NET code with AppDynamics Lite!
>> It's a free troubleshooting tool designed for production.
>> Get down to code-level detail for bottlenecks, with <2% overhead.
>> Download for free and get started troubleshooting in minutes.
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clk
>> trk
>> _______________________________________________
>> E1000-devel mailing list
>> E1000-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/e1000-devel
>> To learn more about Intel&#174; Ethernet, visit
>> http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-08-14  2:23     ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-14  2:23 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel

On Tue, Aug 13, 2013 at 3:54 PM, Skidmore, Donald C
<donald.c.skidmore@intel.com> wrote:

> We were unable to recreate your failure here locally so I have some additional questions.  First off you mentioned it was failing as far back as v3.9, was it ever working for you?  If so bisecting would be really helpful as I mentioned we have been unable to cause the failure in house.

I'm not aware of any working version.  I'm exercising in the sysfs
SR-IOV configuration, which I think appeared in v3.8 or so.

>  If not could you see if the problem still occurs without the external Magma PCIe expansion chassis, this is of course assuming that you can plug the X540 into your system without it.

I played with this a little more and found this:

1) Magma card in z420, connected to chassis containing X540: fails
(original report)
2) X540 in z420, Magma card in z420, connected to empty chassis: fails
3) X540 in z420, Magma card in z420 but no cable to chassis: works

The only difference I've noticed so far between configs 2 & 3 are
different bus numbers and different IRQ assignments:

Config 2 (failing):
  pci 0000:0c:00.0: [8086:1528] type 00 class 0x020000
  pci 0000:0c:00.0: reg 0x10: [mem 0xdac00000-0xdadfffff 64bit pref]
  ixgbe 0000:0c:00.0: irq 82 for MSI/MSI-X
  IRQ 79: 79
  IRQ 80: eth0
  IRQ 81: snd_hda_intel
  IRQ: 82-93 eth1-TxRx-0 through eth1-TxRx-11
  IRQ 94: eth1

Config 3 (working):
  pci 0000:04:00.0: [8086:1528] type 00 class 0x020000
  pci 0000:04:00.0: reg 0x10: [mem 0xdac00000-0xdadfffff 64bit pref]
  ixgbe 0000:04:00.0: irq 75 for MSI/MSI-X
  IRQ 72: ahci
  IRQ 73: eth0
  IRQ 74: snd_hda_intel
  IRQ 75-86: eth1-TxRx-0 through eth1-TxRx-11
  IRQ 87: eth1

I'll try to narrow this down a little more; I'm just giving you this
preliminary info in case it rings any bells for you.

>> -----Original Message-----
>> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> Sent: Friday, August 09, 2013 10:19 AM
>> To: e1000-devel@lists.sourceforge.net
>> Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to
>> PF Nacked" messages
>>
>> When I enable VFs via sysfs on an Intel X540-AT, I see an endless stream of
>>
>>     ixgbevf 0000:08:10.2: Last Request of type 03 to PF Nacked
>>
>> messages.  This on an HP z420 with the Intel X540-AT in external Magma PCIe
>> expansion chassis.  No cable is attached to the X540-AT.
>>
>> ixgbe is built as a module and is auto-loaded during boot, with no VFs
>> enabled.  The "Last request Nacked" messages start when I enable VFs
>> with:
>>
>>     # echo -n 8 > /sys/bus/pci/devices/0000:08:00.0/sriov_numvfs
>>     ixgbe 0000:08:00.0 eth1: SR-IOV enabled with 8 VFs
>>     pci 0000:08:10.0: [8086:1515] type 00 class 0x020000
>>     pci 0000:08:10.2: [8086:1515] type 00 class 0x020000
>>     ...
>>     ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver -
>> version 2.7.12-k
>>     ...
>>     ixgbevf 0000:08:10.2: Last Request of type 03 to PF Nacked
>>     ...
>>
>> This happens with v3.11-rc4, v3.10, and v3.9, which is as far back as I checked.
>> Complete console log and lspci output are here:
>>
>>     http://helgaas.com/linux/ixgbe/z420.log
>>     http://helgaas.com/linux/ixgbe/lspci
>>
>> ------------------------------------------------------------------------------
>> Get 100% visibility into Java/.NET code with AppDynamics Lite!
>> It's a free troubleshooting tool designed for production.
>> Get down to code-level detail for bottlenecks, with <2% overhead.
>> Download for free and get started troubleshooting in minutes.
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clk
>> trk
>> _______________________________________________
>> E1000-devel mailing list
>> E1000-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/e1000-devel
>> To learn more about Intel&#174; Ethernet, visit
>> http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
  2013-08-14  2:23     ` Bjorn Helgaas
@ 2013-08-20 23:08       ` Bjorn Helgaas
  -1 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-20 23:08 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

[+cc Don Dutile]

On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Aug 13, 2013 at 3:54 PM, Skidmore, Donald C
> <donald.c.skidmore@intel.com> wrote:
>
>> We were unable to recreate your failure here locally so I have some additional questions.  First off you mentioned it was failing as far back as v3.9, was it ever working for you?  If so bisecting would be really helpful as I mentioned we have been unable to cause the failure in house.
>
> I'm not aware of any working version.  I'm exercising in the sysfs
> SR-IOV configuration, which I think appeared in v3.8 or so.
>
>>  If not could you see if the problem still occurs without the external Magma PCIe expansion chassis, this is of course assuming that you can plug the X540 into your system without it.
>
> I played with this a little more and found this:
>
> 1) Magma card in z420, connected to chassis containing X540: fails
> (original report)
> 2) X540 in z420, Magma card in z420, connected to empty chassis: fails
> 3) X540 in z420, Magma card in z420 but no cable to chassis: works
>
> The only difference I've noticed so far between configs 2 & 3 are
> different bus numbers and different IRQ assignments:
>
> Config 2 (failing):
>   pci 0000:0c:00.0: [8086:1528] type 00 class 0x020000
>   pci 0000:0c:00.0: reg 0x10: [mem 0xdac00000-0xdadfffff 64bit pref]
>   ixgbe 0000:0c:00.0: irq 82 for MSI/MSI-X
>   IRQ 79: 79
>   IRQ 80: eth0
>   IRQ 81: snd_hda_intel
>   IRQ: 82-93 eth1-TxRx-0 through eth1-TxRx-11
>   IRQ 94: eth1
>
> Config 3 (working):
>   pci 0000:04:00.0: [8086:1528] type 00 class 0x020000
>   pci 0000:04:00.0: reg 0x10: [mem 0xdac00000-0xdadfffff 64bit pref]
>   ixgbe 0000:04:00.0: irq 75 for MSI/MSI-X
>   IRQ 72: ahci
>   IRQ 73: eth0
>   IRQ 74: snd_hda_intel
>   IRQ 75-86: eth1-TxRx-0 through eth1-TxRx-11
>   IRQ 87: eth1
>
> I'll try to narrow this down a little more; I'm just giving you this
> preliminary info in case it rings any bells for you.

Sorry, I haven't gotten anywhere with this yet.  I opened
https://bugzilla.kernel.org/show_bug.cgi?id=60776 and attached logs
and lspci info from v3.11-rc6.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-08-20 23:08       ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-20 23:08 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

[+cc Don Dutile]

On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Aug 13, 2013 at 3:54 PM, Skidmore, Donald C
> <donald.c.skidmore@intel.com> wrote:
>
>> We were unable to recreate your failure here locally so I have some additional questions.  First off you mentioned it was failing as far back as v3.9, was it ever working for you?  If so bisecting would be really helpful as I mentioned we have been unable to cause the failure in house.
>
> I'm not aware of any working version.  I'm exercising in the sysfs
> SR-IOV configuration, which I think appeared in v3.8 or so.
>
>>  If not could you see if the problem still occurs without the external Magma PCIe expansion chassis, this is of course assuming that you can plug the X540 into your system without it.
>
> I played with this a little more and found this:
>
> 1) Magma card in z420, connected to chassis containing X540: fails
> (original report)
> 2) X540 in z420, Magma card in z420, connected to empty chassis: fails
> 3) X540 in z420, Magma card in z420 but no cable to chassis: works
>
> The only difference I've noticed so far between configs 2 & 3 are
> different bus numbers and different IRQ assignments:
>
> Config 2 (failing):
>   pci 0000:0c:00.0: [8086:1528] type 00 class 0x020000
>   pci 0000:0c:00.0: reg 0x10: [mem 0xdac00000-0xdadfffff 64bit pref]
>   ixgbe 0000:0c:00.0: irq 82 for MSI/MSI-X
>   IRQ 79: 79
>   IRQ 80: eth0
>   IRQ 81: snd_hda_intel
>   IRQ: 82-93 eth1-TxRx-0 through eth1-TxRx-11
>   IRQ 94: eth1
>
> Config 3 (working):
>   pci 0000:04:00.0: [8086:1528] type 00 class 0x020000
>   pci 0000:04:00.0: reg 0x10: [mem 0xdac00000-0xdadfffff 64bit pref]
>   ixgbe 0000:04:00.0: irq 75 for MSI/MSI-X
>   IRQ 72: ahci
>   IRQ 73: eth0
>   IRQ 74: snd_hda_intel
>   IRQ 75-86: eth1-TxRx-0 through eth1-TxRx-11
>   IRQ 87: eth1
>
> I'll try to narrow this down a little more; I'm just giving you this
> preliminary info in case it rings any bells for you.

Sorry, I haven't gotten anywhere with this yet.  I opened
https://bugzilla.kernel.org/show_bug.cgi?id=60776 and attached logs
and lspci info from v3.11-rc6.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
  2013-08-20 23:08       ` Bjorn Helgaas
@ 2013-08-20 23:37         ` Bjorn Helgaas
  -1 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-20 23:37 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:

>> I played with this a little more and found this:
>>
>> 1) Magma card in z420, connected to chassis containing X540: fails
>> (original report)
>> 2) X540 in z420, Magma card in z420, connected to empty chassis: fails
>> 3) X540 in z420, Magma card in z420 but no cable to chassis: works

For what it's worth, I tried config 3 again with v3.11-rc6, and it
failed the same way.  I haven't bothered with config 2.  It's not 100%
reproducible, but at least it doesn't seem related to the expansion
chassis.

I attached the logs from config 3 to
https://bugzilla.kernel.org/show_bug.cgi?id=60776

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-08-20 23:37         ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-20 23:37 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:

>> I played with this a little more and found this:
>>
>> 1) Magma card in z420, connected to chassis containing X540: fails
>> (original report)
>> 2) X540 in z420, Magma card in z420, connected to empty chassis: fails
>> 3) X540 in z420, Magma card in z420 but no cable to chassis: works

For what it's worth, I tried config 3 again with v3.11-rc6, and it
failed the same way.  I haven't bothered with config 2.  It's not 100%
reproducible, but at least it doesn't seem related to the expansion
chassis.

I attached the logs from config 3 to
https://bugzilla.kernel.org/show_bug.cgi?id=60776

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
  2013-08-20 23:37         ` Bjorn Helgaas
@ 2013-08-23 16:52           ` Bjorn Helgaas
  -1 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-23 16:52 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
>>> I played with this a little more and found this:
>>>
>>> 1) Magma card in z420, connected to chassis containing X540: fails
>>> (original report)
>>> 2) X540 in z420, Magma card in z420, connected to empty chassis: fails
>>> 3) X540 in z420, Magma card in z420 but no cable to chassis: works
>
> For what it's worth, I tried config 3 again with v3.11-rc6, and it
> failed the same way.  I haven't bothered with config 2.  It's not 100%
> reproducible, but at least it doesn't seem related to the expansion
> chassis.
>
> I attached the logs from config 3 to
> https://bugzilla.kernel.org/show_bug.cgi?id=60776

Is there anything I can do to help debug this?  Add instrumentation,
etc.?  It seems like I'm doing the simplest possible thing -- just
writing to the sysfs sriov_num_vfs file to enable VFs.

I almost think it must be related to my config somehow if nobody else
is seeing this, but at the same time, my config also seems the
simplest possible, so I don't know what I could be doing that's
unusual.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-08-23 16:52           ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-23 16:52 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
>>> I played with this a little more and found this:
>>>
>>> 1) Magma card in z420, connected to chassis containing X540: fails
>>> (original report)
>>> 2) X540 in z420, Magma card in z420, connected to empty chassis: fails
>>> 3) X540 in z420, Magma card in z420 but no cable to chassis: works
>
> For what it's worth, I tried config 3 again with v3.11-rc6, and it
> failed the same way.  I haven't bothered with config 2.  It's not 100%
> reproducible, but at least it doesn't seem related to the expansion
> chassis.
>
> I attached the logs from config 3 to
> https://bugzilla.kernel.org/show_bug.cgi?id=60776

Is there anything I can do to help debug this?  Add instrumentation,
etc.?  It seems like I'm doing the simplest possible thing -- just
writing to the sysfs sriov_num_vfs file to enable VFs.

I almost think it must be related to my config somehow if nobody else
is seeing this, but at the same time, my config also seems the
simplest possible, so I don't know what I could be doing that's
unusual.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
  2013-08-23 16:52           ` Bjorn Helgaas
@ 2013-08-23 18:25             ` Skidmore, Donald C
  -1 siblings, 0 replies; 28+ messages in thread
From: Skidmore, Donald C @ 2013-08-23 18:25 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Friday, August 23, 2013 9:53 AM
> To: Skidmore, Donald C
> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org; Don Dutile
> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
> to PF Nacked" messages
> 
> On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas <bhelgaas@google.com>
> wrote:
> > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas <bhelgaas@google.com>
> wrote:
> >> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas <bhelgaas@google.com>
> wrote:
> >
> >>> I played with this a little more and found this:
> >>>
> >>> 1) Magma card in z420, connected to chassis containing X540: fails
> >>> (original report)
> >>> 2) X540 in z420, Magma card in z420, connected to empty chassis:
> >>> fails
> >>> 3) X540 in z420, Magma card in z420 but no cable to chassis: works
> >
> > For what it's worth, I tried config 3 again with v3.11-rc6, and it
> > failed the same way.  I haven't bothered with config 2.  It's not 100%
> > reproducible, but at least it doesn't seem related to the expansion
> > chassis.
> >
> > I attached the logs from config 3 to
> > https://bugzilla.kernel.org/show_bug.cgi?id=60776
> 
> Is there anything I can do to help debug this?  Add instrumentation, etc.?  It
> seems like I'm doing the simplest possible thing -- just writing to the sysfs
> sriov_num_vfs file to enable VFs.
> 
> I almost think it must be related to my config somehow if nobody else is
> seeing this, but at the same time, my config also seems the simplest possible,
> so I don't know what I could be doing that's unusual.
> 
> Bjorn

Hey Bjorn,

I'm may be little confused so bear with me.

Option 1 = (your normal set up), Magma card plugged to chasis, X540 in chasis.
Option 2 = Magma card plugged to chasis, X540 in z420 system.
Option 3 = Magma card UNplugged from chasis, x540 in z420 system.

Options 1 & 2 - always fail
Option 3 - sometimes fails (unsure at what rate failure occurs)

Please correct me if I messed any of that up. :)

Another question I have relates to the lspci output you supplied in the bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run lspci before you created the VF's?  If so could we see one while the failure was occurring?

Also could you download the latest ixgbevf from source forge?

https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/

If we add debugging messages it will be easier to patch this driver and it contains our latest validated code base.

Thanks,
-Don Skidmore <donald.c.skidmore@intel.com>




^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-08-23 18:25             ` Skidmore, Donald C
  0 siblings, 0 replies; 28+ messages in thread
From: Skidmore, Donald C @ 2013-08-23 18:25 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Friday, August 23, 2013 9:53 AM
> To: Skidmore, Donald C
> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org; Don Dutile
> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
> to PF Nacked" messages
> 
> On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas <bhelgaas@google.com>
> wrote:
> > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas <bhelgaas@google.com>
> wrote:
> >> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas <bhelgaas@google.com>
> wrote:
> >
> >>> I played with this a little more and found this:
> >>>
> >>> 1) Magma card in z420, connected to chassis containing X540: fails
> >>> (original report)
> >>> 2) X540 in z420, Magma card in z420, connected to empty chassis:
> >>> fails
> >>> 3) X540 in z420, Magma card in z420 but no cable to chassis: works
> >
> > For what it's worth, I tried config 3 again with v3.11-rc6, and it
> > failed the same way.  I haven't bothered with config 2.  It's not 100%
> > reproducible, but at least it doesn't seem related to the expansion
> > chassis.
> >
> > I attached the logs from config 3 to
> > https://bugzilla.kernel.org/show_bug.cgi?id=60776
> 
> Is there anything I can do to help debug this?  Add instrumentation, etc.?  It
> seems like I'm doing the simplest possible thing -- just writing to the sysfs
> sriov_num_vfs file to enable VFs.
> 
> I almost think it must be related to my config somehow if nobody else is
> seeing this, but at the same time, my config also seems the simplest possible,
> so I don't know what I could be doing that's unusual.
> 
> Bjorn

Hey Bjorn,

I'm may be little confused so bear with me.

Option 1 = (your normal set up), Magma card plugged to chasis, X540 in chasis.
Option 2 = Magma card plugged to chasis, X540 in z420 system.
Option 3 = Magma card UNplugged from chasis, x540 in z420 system.

Options 1 & 2 - always fail
Option 3 - sometimes fails (unsure at what rate failure occurs)

Please correct me if I messed any of that up. :)

Another question I have relates to the lspci output you supplied in the bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run lspci before you created the VF's?  If so could we see one while the failure was occurring?

Also could you download the latest ixgbevf from source forge?

https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/

If we add debugging messages it will be easier to patch this driver and it contains our latest validated code base.

Thanks,
-Don Skidmore <donald.c.skidmore@intel.com>




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
  2013-08-23 18:25             ` Skidmore, Donald C
@ 2013-08-23 18:52               ` Bjorn Helgaas
  -1 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-23 18:52 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

On Fri, Aug 23, 2013 at 06:25:06PM +0000, Skidmore, Donald C wrote:
> > -----Original Message-----
> > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> > Sent: Friday, August 23, 2013 9:53 AM
> > To: Skidmore, Donald C
> > Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
> > kernel@vger.kernel.org; Don Dutile
> > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
> > to PF Nacked" messages
> > 
> > On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas <bhelgaas@google.com>
> > wrote:
> > > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas <bhelgaas@google.com>
> > wrote:
> > >> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas <bhelgaas@google.com>
> > wrote:
> > >
> > >>> I played with this a little more and found this:
> > >>>
> > >>> 1) Magma card in z420, connected to chassis containing X540: fails
> > >>> (original report)
> > >>> 2) X540 in z420, Magma card in z420, connected to empty chassis:
> > >>> fails
> > >>> 3) X540 in z420, Magma card in z420 but no cable to chassis: works
> > >
> > > For what it's worth, I tried config 3 again with v3.11-rc6, and it
> > > failed the same way.  I haven't bothered with config 2.  It's not 100%
> > > reproducible, but at least it doesn't seem related to the expansion
> > > chassis.
> > >
> > > I attached the logs from config 3 to
> > > https://bugzilla.kernel.org/show_bug.cgi?id=60776
> > 
> > Is there anything I can do to help debug this?  Add instrumentation, etc.?  It
> > seems like I'm doing the simplest possible thing -- just writing to the sysfs
> > sriov_num_vfs file to enable VFs.
> > 
> > I almost think it must be related to my config somehow if nobody else is
> > seeing this, but at the same time, my config also seems the simplest possible,
> > so I don't know what I could be doing that's unusual.
> > 
> > Bjorn
> 
> Hey Bjorn,
> 
> I'm may be little confused so bear with me.
> 
> Option 1 = (your normal set up), Magma card plugged to chasis, X540 in chasis.
> Option 2 = Magma card plugged to chasis, X540 in z420 system.
> Option 3 = Magma card UNplugged from chasis, x540 in z420 system.
> 
> Options 1 & 2 - always fail
> Option 3 - sometimes fails (unsure at what rate failure occurs)
> 
> Please correct me if I messed any of that up. :)

Generally correct.  I've seen failures in all three configs, so I'm only
concerned with the simplest for now (config 3, no expansion chassis).

> Another question I have relates to the lspci output you supplied in the bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run lspci before you created the VF's?  If so could we see one while the failure was occurring?

That's correct, I collected the lspci output before reproducing the
problem.  I can't easily collect lspci afterwards because the machine isn't
responsive after the problem starts.

> Also could you download the latest ixgbevf from source forge?
> 
> https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
> 
> If we add debugging messages it will be easier to patch this driver and it contains our latest validated code base.

I can do that if it turns out to be necessary.  But John Haller gave me a
good clue off-list: 

John wrote:
> I assume you want the VFs to be instantiated in a VM.  To do this,
> you need to blacklist the ixgbevf driver in the host (or not
> compile it into the host), or it will try to associate the driver
> in the host, rather than in the VM where you want it.  Then, the
> VM needs the ixgbevf driver, which will hopefully do a better job
> of talking to the mailbox in the host.  There is some work to
> assign the VF(s) to the VM, but I don't remember that offhand.

I don't have any VMs (I started this whole thing because I was
looking at a PCI hotplug issue related to SR-IOV, so I don't really
care about VMs).

So the ixgbevf driver on the *host* is claiming the new VFs, and it
sounds like maybe it can't handle that?

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-08-23 18:52               ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-23 18:52 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

On Fri, Aug 23, 2013 at 06:25:06PM +0000, Skidmore, Donald C wrote:
> > -----Original Message-----
> > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> > Sent: Friday, August 23, 2013 9:53 AM
> > To: Skidmore, Donald C
> > Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
> > kernel@vger.kernel.org; Don Dutile
> > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
> > to PF Nacked" messages
> > 
> > On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas <bhelgaas@google.com>
> > wrote:
> > > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas <bhelgaas@google.com>
> > wrote:
> > >> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas <bhelgaas@google.com>
> > wrote:
> > >
> > >>> I played with this a little more and found this:
> > >>>
> > >>> 1) Magma card in z420, connected to chassis containing X540: fails
> > >>> (original report)
> > >>> 2) X540 in z420, Magma card in z420, connected to empty chassis:
> > >>> fails
> > >>> 3) X540 in z420, Magma card in z420 but no cable to chassis: works
> > >
> > > For what it's worth, I tried config 3 again with v3.11-rc6, and it
> > > failed the same way.  I haven't bothered with config 2.  It's not 100%
> > > reproducible, but at least it doesn't seem related to the expansion
> > > chassis.
> > >
> > > I attached the logs from config 3 to
> > > https://bugzilla.kernel.org/show_bug.cgi?id=60776
> > 
> > Is there anything I can do to help debug this?  Add instrumentation, etc.?  It
> > seems like I'm doing the simplest possible thing -- just writing to the sysfs
> > sriov_num_vfs file to enable VFs.
> > 
> > I almost think it must be related to my config somehow if nobody else is
> > seeing this, but at the same time, my config also seems the simplest possible,
> > so I don't know what I could be doing that's unusual.
> > 
> > Bjorn
> 
> Hey Bjorn,
> 
> I'm may be little confused so bear with me.
> 
> Option 1 = (your normal set up), Magma card plugged to chasis, X540 in chasis.
> Option 2 = Magma card plugged to chasis, X540 in z420 system.
> Option 3 = Magma card UNplugged from chasis, x540 in z420 system.
> 
> Options 1 & 2 - always fail
> Option 3 - sometimes fails (unsure at what rate failure occurs)
> 
> Please correct me if I messed any of that up. :)

Generally correct.  I've seen failures in all three configs, so I'm only
concerned with the simplest for now (config 3, no expansion chassis).

> Another question I have relates to the lspci output you supplied in the bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run lspci before you created the VF's?  If so could we see one while the failure was occurring?

That's correct, I collected the lspci output before reproducing the
problem.  I can't easily collect lspci afterwards because the machine isn't
responsive after the problem starts.

> Also could you download the latest ixgbevf from source forge?
> 
> https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
> 
> If we add debugging messages it will be easier to patch this driver and it contains our latest validated code base.

I can do that if it turns out to be necessary.  But John Haller gave me a
good clue off-list: 

John wrote:
> I assume you want the VFs to be instantiated in a VM.  To do this,
> you need to blacklist the ixgbevf driver in the host (or not
> compile it into the host), or it will try to associate the driver
> in the host, rather than in the VM where you want it.  Then, the
> VM needs the ixgbevf driver, which will hopefully do a better job
> of talking to the mailbox in the host.  There is some work to
> assign the VF(s) to the VM, but I don't remember that offhand.

I don't have any VMs (I started this whole thing because I was
looking at a PCI hotplug issue related to SR-IOV, so I don't really
care about VMs).

So the ixgbevf driver on the *host* is claiming the new VFs, and it
sounds like maybe it can't handle that?

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
  2013-08-23 18:52               ` Bjorn Helgaas
@ 2013-08-23 20:37                 ` Skidmore, Donald C
  -1 siblings, 0 replies; 28+ messages in thread
From: Skidmore, Donald C @ 2013-08-23 20:37 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Friday, August 23, 2013 11:53 AM
> To: Skidmore, Donald C
> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org; Don Dutile
> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
> to PF Nacked" messages
> 
> On Fri, Aug 23, 2013 at 06:25:06PM +0000, Skidmore, Donald C wrote:
> > > -----Original Message-----
> > > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> > > Sent: Friday, August 23, 2013 9:53 AM
> > > To: Skidmore, Donald C
> > > Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
> > > linux- kernel@vger.kernel.org; Don Dutile
> > > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request
> > > of type 00 to PF Nacked" messages
> > >
> > > On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas <bhelgaas@google.com>
> > > wrote:
> > > > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas
> > > > <bhelgaas@google.com>
> > > wrote:
> > > >> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas
> > > >> <bhelgaas@google.com>
> > > wrote:
> > > >
> > > >>> I played with this a little more and found this:
> > > >>>
> > > >>> 1) Magma card in z420, connected to chassis containing X540:
> > > >>> fails (original report)
> > > >>> 2) X540 in z420, Magma card in z420, connected to empty chassis:
> > > >>> fails
> > > >>> 3) X540 in z420, Magma card in z420 but no cable to chassis:
> > > >>> works
> > > >
> > > > For what it's worth, I tried config 3 again with v3.11-rc6, and it
> > > > failed the same way.  I haven't bothered with config 2.  It's not
> > > > 100% reproducible, but at least it doesn't seem related to the
> > > > expansion chassis.
> > > >
> > > > I attached the logs from config 3 to
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=60776
> > >
> > > Is there anything I can do to help debug this?  Add instrumentation,
> > > etc.?  It seems like I'm doing the simplest possible thing -- just
> > > writing to the sysfs sriov_num_vfs file to enable VFs.
> > >
> > > I almost think it must be related to my config somehow if nobody
> > > else is seeing this, but at the same time, my config also seems the
> > > simplest possible, so I don't know what I could be doing that's unusual.
> > >
> > > Bjorn
> >
> > Hey Bjorn,
> >
> > I'm may be little confused so bear with me.
> >
> > Option 1 = (your normal set up), Magma card plugged to chasis, X540 in
> chasis.
> > Option 2 = Magma card plugged to chasis, X540 in z420 system.
> > Option 3 = Magma card UNplugged from chasis, x540 in z420 system.
> >
> > Options 1 & 2 - always fail
> > Option 3 - sometimes fails (unsure at what rate failure occurs)
> >
> > Please correct me if I messed any of that up. :)
> 
> Generally correct.  I've seen failures in all three configs, so I'm only
> concerned with the simplest for now (config 3, no expansion chassis).
> 
> > Another question I have relates to the lspci output you supplied in the
> bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run lspci before
> you created the VF's?  If so could we see one while the failure was occurring?
> 
> That's correct, I collected the lspci output before reproducing the problem.  I
> can't easily collect lspci afterwards because the machine isn't responsive
> after the problem starts.
> 
> > Also could you download the latest ixgbevf from source forge?
> >
> > https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
> >
> > If we add debugging messages it will be easier to patch this driver and it
> contains our latest validated code base.
> 
> I can do that if it turns out to be necessary.  But John Haller gave me a good
> clue off-list:
> 
> John wrote:
> > I assume you want the VFs to be instantiated in a VM.  To do this, you
> > need to blacklist the ixgbevf driver in the host (or not compile it
> > into the host), or it will try to associate the driver in the host,
> > rather than in the VM where you want it.  Then, the VM needs the
> > ixgbevf driver, which will hopefully do a better job of talking to the
> > mailbox in the host.  There is some work to assign the VF(s) to the
> > VM, but I don't remember that offhand.
> 
> I don't have any VMs (I started this whole thing because I was looking at a PCI
> hotplug issue related to SR-IOV, so I don't really care about VMs).
> 
> So the ixgbevf driver on the *host* is claiming the new VFs, and it sounds like
> maybe it can't handle that?
> 
> Bjorn

Not to speak for John, but I believe he was saying if you want to use your VF's in a VM you need to make sure you don't run the ixgbevf driver on the host as it will "claim" the VF's.  If you are NOT running any VM's then it is perfectly fine to have both ixgbe and ixgbevf loaded.

-Don Skidmore <donald.c.skidmore@intel.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-08-23 20:37                 ` Skidmore, Donald C
  0 siblings, 0 replies; 28+ messages in thread
From: Skidmore, Donald C @ 2013-08-23 20:37 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Friday, August 23, 2013 11:53 AM
> To: Skidmore, Donald C
> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org; Don Dutile
> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
> to PF Nacked" messages
> 
> On Fri, Aug 23, 2013 at 06:25:06PM +0000, Skidmore, Donald C wrote:
> > > -----Original Message-----
> > > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> > > Sent: Friday, August 23, 2013 9:53 AM
> > > To: Skidmore, Donald C
> > > Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
> > > linux- kernel@vger.kernel.org; Don Dutile
> > > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request
> > > of type 00 to PF Nacked" messages
> > >
> > > On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas <bhelgaas@google.com>
> > > wrote:
> > > > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas
> > > > <bhelgaas@google.com>
> > > wrote:
> > > >> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas
> > > >> <bhelgaas@google.com>
> > > wrote:
> > > >
> > > >>> I played with this a little more and found this:
> > > >>>
> > > >>> 1) Magma card in z420, connected to chassis containing X540:
> > > >>> fails (original report)
> > > >>> 2) X540 in z420, Magma card in z420, connected to empty chassis:
> > > >>> fails
> > > >>> 3) X540 in z420, Magma card in z420 but no cable to chassis:
> > > >>> works
> > > >
> > > > For what it's worth, I tried config 3 again with v3.11-rc6, and it
> > > > failed the same way.  I haven't bothered with config 2.  It's not
> > > > 100% reproducible, but at least it doesn't seem related to the
> > > > expansion chassis.
> > > >
> > > > I attached the logs from config 3 to
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=60776
> > >
> > > Is there anything I can do to help debug this?  Add instrumentation,
> > > etc.?  It seems like I'm doing the simplest possible thing -- just
> > > writing to the sysfs sriov_num_vfs file to enable VFs.
> > >
> > > I almost think it must be related to my config somehow if nobody
> > > else is seeing this, but at the same time, my config also seems the
> > > simplest possible, so I don't know what I could be doing that's unusual.
> > >
> > > Bjorn
> >
> > Hey Bjorn,
> >
> > I'm may be little confused so bear with me.
> >
> > Option 1 = (your normal set up), Magma card plugged to chasis, X540 in
> chasis.
> > Option 2 = Magma card plugged to chasis, X540 in z420 system.
> > Option 3 = Magma card UNplugged from chasis, x540 in z420 system.
> >
> > Options 1 & 2 - always fail
> > Option 3 - sometimes fails (unsure at what rate failure occurs)
> >
> > Please correct me if I messed any of that up. :)
> 
> Generally correct.  I've seen failures in all three configs, so I'm only
> concerned with the simplest for now (config 3, no expansion chassis).
> 
> > Another question I have relates to the lspci output you supplied in the
> bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run lspci before
> you created the VF's?  If so could we see one while the failure was occurring?
> 
> That's correct, I collected the lspci output before reproducing the problem.  I
> can't easily collect lspci afterwards because the machine isn't responsive
> after the problem starts.
> 
> > Also could you download the latest ixgbevf from source forge?
> >
> > https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
> >
> > If we add debugging messages it will be easier to patch this driver and it
> contains our latest validated code base.
> 
> I can do that if it turns out to be necessary.  But John Haller gave me a good
> clue off-list:
> 
> John wrote:
> > I assume you want the VFs to be instantiated in a VM.  To do this, you
> > need to blacklist the ixgbevf driver in the host (or not compile it
> > into the host), or it will try to associate the driver in the host,
> > rather than in the VM where you want it.  Then, the VM needs the
> > ixgbevf driver, which will hopefully do a better job of talking to the
> > mailbox in the host.  There is some work to assign the VF(s) to the
> > VM, but I don't remember that offhand.
> 
> I don't have any VMs (I started this whole thing because I was looking at a PCI
> hotplug issue related to SR-IOV, so I don't really care about VMs).
> 
> So the ixgbevf driver on the *host* is claiming the new VFs, and it sounds like
> maybe it can't handle that?
> 
> Bjorn

Not to speak for John, but I believe he was saying if you want to use your VF's in a VM you need to make sure you don't run the ixgbevf driver on the host as it will "claim" the VF's.  If you are NOT running any VM's then it is perfectly fine to have both ixgbe and ixgbevf loaded.

-Don Skidmore <donald.c.skidmore@intel.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
  2013-08-23 20:37                 ` Skidmore, Donald C
@ 2013-08-23 20:42                   ` Bjorn Helgaas
  -1 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-23 20:42 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

On Fri, Aug 23, 2013 at 2:37 PM, Skidmore, Donald C
<donald.c.skidmore@intel.com> wrote:
>> -----Original Message-----
>> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> Sent: Friday, August 23, 2013 11:53 AM
>> To: Skidmore, Donald C
>> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Don Dutile
>> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
>> to PF Nacked" messages
>>
>> On Fri, Aug 23, 2013 at 06:25:06PM +0000, Skidmore, Donald C wrote:
>> > > -----Original Message-----
>> > > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> > > Sent: Friday, August 23, 2013 9:53 AM
>> > > To: Skidmore, Donald C
>> > > Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
>> > > linux- kernel@vger.kernel.org; Don Dutile
>> > > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request
>> > > of type 00 to PF Nacked" messages
>> > >
>> > > On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas <bhelgaas@google.com>
>> > > wrote:
>> > > > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas
>> > > > <bhelgaas@google.com>
>> > > wrote:
>> > > >> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas
>> > > >> <bhelgaas@google.com>
>> > > wrote:
>> > > >
>> > > >>> I played with this a little more and found this:
>> > > >>>
>> > > >>> 1) Magma card in z420, connected to chassis containing X540:
>> > > >>> fails (original report)
>> > > >>> 2) X540 in z420, Magma card in z420, connected to empty chassis:
>> > > >>> fails
>> > > >>> 3) X540 in z420, Magma card in z420 but no cable to chassis:
>> > > >>> works
>> > > >
>> > > > For what it's worth, I tried config 3 again with v3.11-rc6, and it
>> > > > failed the same way.  I haven't bothered with config 2.  It's not
>> > > > 100% reproducible, but at least it doesn't seem related to the
>> > > > expansion chassis.
>> > > >
>> > > > I attached the logs from config 3 to
>> > > > https://bugzilla.kernel.org/show_bug.cgi?id=60776
>> > >
>> > > Is there anything I can do to help debug this?  Add instrumentation,
>> > > etc.?  It seems like I'm doing the simplest possible thing -- just
>> > > writing to the sysfs sriov_num_vfs file to enable VFs.
>> > >
>> > > I almost think it must be related to my config somehow if nobody
>> > > else is seeing this, but at the same time, my config also seems the
>> > > simplest possible, so I don't know what I could be doing that's unusual.
>> > >
>> > > Bjorn
>> >
>> > Hey Bjorn,
>> >
>> > I'm may be little confused so bear with me.
>> >
>> > Option 1 = (your normal set up), Magma card plugged to chasis, X540 in
>> chasis.
>> > Option 2 = Magma card plugged to chasis, X540 in z420 system.
>> > Option 3 = Magma card UNplugged from chasis, x540 in z420 system.
>> >
>> > Options 1 & 2 - always fail
>> > Option 3 - sometimes fails (unsure at what rate failure occurs)
>> >
>> > Please correct me if I messed any of that up. :)
>>
>> Generally correct.  I've seen failures in all three configs, so I'm only
>> concerned with the simplest for now (config 3, no expansion chassis).
>>
>> > Another question I have relates to the lspci output you supplied in the
>> bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run lspci before
>> you created the VF's?  If so could we see one while the failure was occurring?
>>
>> That's correct, I collected the lspci output before reproducing the problem.  I
>> can't easily collect lspci afterwards because the machine isn't responsive
>> after the problem starts.
>>
>> > Also could you download the latest ixgbevf from source forge?
>> >
>> > https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
>> >
>> > If we add debugging messages it will be easier to patch this driver and it
>> contains our latest validated code base.
>>
>> I can do that if it turns out to be necessary.  But John Haller gave me a good
>> clue off-list:
>>
>> John wrote:
>> > I assume you want the VFs to be instantiated in a VM.  To do this, you
>> > need to blacklist the ixgbevf driver in the host (or not compile it
>> > into the host), or it will try to associate the driver in the host,
>> > rather than in the VM where you want it.  Then, the VM needs the
>> > ixgbevf driver, which will hopefully do a better job of talking to the
>> > mailbox in the host.  There is some work to assign the VF(s) to the
>> > VM, but I don't remember that offhand.
>>
>> I don't have any VMs (I started this whole thing because I was looking at a PCI
>> hotplug issue related to SR-IOV, so I don't really care about VMs).
>>
>> So the ixgbevf driver on the *host* is claiming the new VFs, and it sounds like
>> maybe it can't handle that?
>>
>> Bjorn
>
> Not to speak for John, but I believe he was saying if you want to use your VF's in a VM you need to make sure you don't run the ixgbevf driver on the host as it will "claim" the VF's.  If you are NOT running any VM's then it is perfectly fine to have both ixgbe and ixgbevf loaded.

OK.  It certainly *seemed* surprising to have the ixgbevf driver blow
up, even if it was an error on my part to load it in the host.  Just
let me know if there's any more testing I can do.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-08-23 20:42                   ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-23 20:42 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

On Fri, Aug 23, 2013 at 2:37 PM, Skidmore, Donald C
<donald.c.skidmore@intel.com> wrote:
>> -----Original Message-----
>> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> Sent: Friday, August 23, 2013 11:53 AM
>> To: Skidmore, Donald C
>> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Don Dutile
>> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
>> to PF Nacked" messages
>>
>> On Fri, Aug 23, 2013 at 06:25:06PM +0000, Skidmore, Donald C wrote:
>> > > -----Original Message-----
>> > > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> > > Sent: Friday, August 23, 2013 9:53 AM
>> > > To: Skidmore, Donald C
>> > > Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
>> > > linux- kernel@vger.kernel.org; Don Dutile
>> > > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request
>> > > of type 00 to PF Nacked" messages
>> > >
>> > > On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas <bhelgaas@google.com>
>> > > wrote:
>> > > > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas
>> > > > <bhelgaas@google.com>
>> > > wrote:
>> > > >> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas
>> > > >> <bhelgaas@google.com>
>> > > wrote:
>> > > >
>> > > >>> I played with this a little more and found this:
>> > > >>>
>> > > >>> 1) Magma card in z420, connected to chassis containing X540:
>> > > >>> fails (original report)
>> > > >>> 2) X540 in z420, Magma card in z420, connected to empty chassis:
>> > > >>> fails
>> > > >>> 3) X540 in z420, Magma card in z420 but no cable to chassis:
>> > > >>> works
>> > > >
>> > > > For what it's worth, I tried config 3 again with v3.11-rc6, and it
>> > > > failed the same way.  I haven't bothered with config 2.  It's not
>> > > > 100% reproducible, but at least it doesn't seem related to the
>> > > > expansion chassis.
>> > > >
>> > > > I attached the logs from config 3 to
>> > > > https://bugzilla.kernel.org/show_bug.cgi?id=60776
>> > >
>> > > Is there anything I can do to help debug this?  Add instrumentation,
>> > > etc.?  It seems like I'm doing the simplest possible thing -- just
>> > > writing to the sysfs sriov_num_vfs file to enable VFs.
>> > >
>> > > I almost think it must be related to my config somehow if nobody
>> > > else is seeing this, but at the same time, my config also seems the
>> > > simplest possible, so I don't know what I could be doing that's unusual.
>> > >
>> > > Bjorn
>> >
>> > Hey Bjorn,
>> >
>> > I'm may be little confused so bear with me.
>> >
>> > Option 1 = (your normal set up), Magma card plugged to chasis, X540 in
>> chasis.
>> > Option 2 = Magma card plugged to chasis, X540 in z420 system.
>> > Option 3 = Magma card UNplugged from chasis, x540 in z420 system.
>> >
>> > Options 1 & 2 - always fail
>> > Option 3 - sometimes fails (unsure at what rate failure occurs)
>> >
>> > Please correct me if I messed any of that up. :)
>>
>> Generally correct.  I've seen failures in all three configs, so I'm only
>> concerned with the simplest for now (config 3, no expansion chassis).
>>
>> > Another question I have relates to the lspci output you supplied in the
>> bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run lspci before
>> you created the VF's?  If so could we see one while the failure was occurring?
>>
>> That's correct, I collected the lspci output before reproducing the problem.  I
>> can't easily collect lspci afterwards because the machine isn't responsive
>> after the problem starts.
>>
>> > Also could you download the latest ixgbevf from source forge?
>> >
>> > https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
>> >
>> > If we add debugging messages it will be easier to patch this driver and it
>> contains our latest validated code base.
>>
>> I can do that if it turns out to be necessary.  But John Haller gave me a good
>> clue off-list:
>>
>> John wrote:
>> > I assume you want the VFs to be instantiated in a VM.  To do this, you
>> > need to blacklist the ixgbevf driver in the host (or not compile it
>> > into the host), or it will try to associate the driver in the host,
>> > rather than in the VM where you want it.  Then, the VM needs the
>> > ixgbevf driver, which will hopefully do a better job of talking to the
>> > mailbox in the host.  There is some work to assign the VF(s) to the
>> > VM, but I don't remember that offhand.
>>
>> I don't have any VMs (I started this whole thing because I was looking at a PCI
>> hotplug issue related to SR-IOV, so I don't really care about VMs).
>>
>> So the ixgbevf driver on the *host* is claiming the new VFs, and it sounds like
>> maybe it can't handle that?
>>
>> Bjorn
>
> Not to speak for John, but I believe he was saying if you want to use your VF's in a VM you need to make sure you don't run the ixgbevf driver on the host as it will "claim" the VF's.  If you are NOT running any VM's then it is perfectly fine to have both ixgbe and ixgbevf loaded.

OK.  It certainly *seemed* surprising to have the ixgbevf driver blow
up, even if it was an error on my part to load it in the host.  Just
let me know if there's any more testing I can do.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
  2013-08-23 20:42                   ` Bjorn Helgaas
@ 2013-08-23 21:41                     ` Skidmore, Donald C
  -1 siblings, 0 replies; 28+ messages in thread
From: Skidmore, Donald C @ 2013-08-23 21:41 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Friday, August 23, 2013 1:43 PM
> To: Skidmore, Donald C
> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org; Don Dutile
> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
> to PF Nacked" messages
> 
> On Fri, Aug 23, 2013 at 2:37 PM, Skidmore, Donald C
> <donald.c.skidmore@intel.com> wrote:
> >> -----Original Message-----
> >> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> >> Sent: Friday, August 23, 2013 11:53 AM
> >> To: Skidmore, Donald C
> >> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
> >> linux- kernel@vger.kernel.org; Don Dutile
> >> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of
> >> type 00 to PF Nacked" messages
> >>
> >> On Fri, Aug 23, 2013 at 06:25:06PM +0000, Skidmore, Donald C wrote:
> >> > > -----Original Message-----
> >> > > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> >> > > Sent: Friday, August 23, 2013 9:53 AM
> >> > > To: Skidmore, Donald C
> >> > > Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
> >> > > linux- kernel@vger.kernel.org; Don Dutile
> >> > > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last
> >> > > Request of type 00 to PF Nacked" messages
> >> > >
> >> > > On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas
> >> > > <bhelgaas@google.com>
> >> > > wrote:
> >> > > > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas
> >> > > > <bhelgaas@google.com>
> >> > > wrote:
> >> > > >> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas
> >> > > >> <bhelgaas@google.com>
> >> > > wrote:
> >> > > >
> >> > > >>> I played with this a little more and found this:
> >> > > >>>
> >> > > >>> 1) Magma card in z420, connected to chassis containing X540:
> >> > > >>> fails (original report)
> >> > > >>> 2) X540 in z420, Magma card in z420, connected to empty chassis:
> >> > > >>> fails
> >> > > >>> 3) X540 in z420, Magma card in z420 but no cable to chassis:
> >> > > >>> works
> >> > > >
> >> > > > For what it's worth, I tried config 3 again with v3.11-rc6, and
> >> > > > it failed the same way.  I haven't bothered with config 2.
> >> > > > It's not 100% reproducible, but at least it doesn't seem
> >> > > > related to the expansion chassis.
> >> > > >
> >> > > > I attached the logs from config 3 to
> >> > > > https://bugzilla.kernel.org/show_bug.cgi?id=60776
> >> > >
> >> > > Is there anything I can do to help debug this?  Add
> >> > > instrumentation, etc.?  It seems like I'm doing the simplest
> >> > > possible thing -- just writing to the sysfs sriov_num_vfs file to enable
> VFs.
> >> > >
> >> > > I almost think it must be related to my config somehow if nobody
> >> > > else is seeing this, but at the same time, my config also seems
> >> > > the simplest possible, so I don't know what I could be doing that's
> unusual.
> >> > >
> >> > > Bjorn
> >> >
> >> > Hey Bjorn,
> >> >
> >> > I'm may be little confused so bear with me.
> >> >
> >> > Option 1 = (your normal set up), Magma card plugged to chasis, X540
> >> > in
> >> chasis.
> >> > Option 2 = Magma card plugged to chasis, X540 in z420 system.
> >> > Option 3 = Magma card UNplugged from chasis, x540 in z420 system.
> >> >
> >> > Options 1 & 2 - always fail
> >> > Option 3 - sometimes fails (unsure at what rate failure occurs)
> >> >
> >> > Please correct me if I messed any of that up. :)
> >>
> >> Generally correct.  I've seen failures in all three configs, so I'm
> >> only concerned with the simplest for now (config 3, no expansion chassis).
> >>
> >> > Another question I have relates to the lspci output you supplied in
> >> > the
> >> bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run
> >> lspci before you created the VF's?  If so could we see one while the failure
> was occurring?
> >>
> >> That's correct, I collected the lspci output before reproducing the
> >> problem.  I can't easily collect lspci afterwards because the machine
> >> isn't responsive after the problem starts.
> >>
> >> > Also could you download the latest ixgbevf from source forge?
> >> >
> >> > https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
> >> >
> >> > If we add debugging messages it will be easier to patch this driver
> >> > and it
> >> contains our latest validated code base.
> >>
> >> I can do that if it turns out to be necessary.  But John Haller gave
> >> me a good clue off-list:
> >>
> >> John wrote:
> >> > I assume you want the VFs to be instantiated in a VM.  To do this,
> >> > you need to blacklist the ixgbevf driver in the host (or not
> >> > compile it into the host), or it will try to associate the driver
> >> > in the host, rather than in the VM where you want it.  Then, the VM
> >> > needs the ixgbevf driver, which will hopefully do a better job of
> >> > talking to the mailbox in the host.  There is some work to assign
> >> > the VF(s) to the VM, but I don't remember that offhand.
> >>
> >> I don't have any VMs (I started this whole thing because I was
> >> looking at a PCI hotplug issue related to SR-IOV, so I don't really care
> about VMs).
> >>
> >> So the ixgbevf driver on the *host* is claiming the new VFs, and it
> >> sounds like maybe it can't handle that?
> >>
> >> Bjorn
> >
> > Not to speak for John, but I believe he was saying if you want to use your
> VF's in a VM you need to make sure you don't run the ixgbevf driver on the
> host as it will "claim" the VF's.  If you are NOT running any VM's then it is
> perfectly fine to have both ixgbe and ixgbevf loaded.
> 
> OK.  It certainly *seemed* surprising to have the ixgbevf driver blow up,
> even if it was an error on my part to load it in the host.  Just let me know if
> there's any more testing I can do.
> 
> Bjorn

Something is leading to the mbx messages being messed up as event by the " Last Request of type 03 to PF Nacked" messages.   Have you tried reseting the ixgbevf port (ethtool -r <your port>)?  Is it even possible to do this as you mentioned that in the failure state the machine isn't very responsive?

If it might be worthwhile to add logging into the ixgbevf and ixgbe drivers around the mbx messages, with the hope being that it would help show what is going between the two.  There have been some changes in that area of the ixgbevf code as of late, so working off the latest source forge driver would the easiest for me to send you patch on.  Sadly we haven't been able to recreate the failure here so it makes it rather hard to debug.

Thanks,
-Don Skidmore <donald.c.skidmore@intel.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-08-23 21:41                     ` Skidmore, Donald C
  0 siblings, 0 replies; 28+ messages in thread
From: Skidmore, Donald C @ 2013-08-23 21:41 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Friday, August 23, 2013 1:43 PM
> To: Skidmore, Donald C
> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org; Don Dutile
> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
> to PF Nacked" messages
> 
> On Fri, Aug 23, 2013 at 2:37 PM, Skidmore, Donald C
> <donald.c.skidmore@intel.com> wrote:
> >> -----Original Message-----
> >> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> >> Sent: Friday, August 23, 2013 11:53 AM
> >> To: Skidmore, Donald C
> >> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
> >> linux- kernel@vger.kernel.org; Don Dutile
> >> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of
> >> type 00 to PF Nacked" messages
> >>
> >> On Fri, Aug 23, 2013 at 06:25:06PM +0000, Skidmore, Donald C wrote:
> >> > > -----Original Message-----
> >> > > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> >> > > Sent: Friday, August 23, 2013 9:53 AM
> >> > > To: Skidmore, Donald C
> >> > > Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
> >> > > linux- kernel@vger.kernel.org; Don Dutile
> >> > > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last
> >> > > Request of type 00 to PF Nacked" messages
> >> > >
> >> > > On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas
> >> > > <bhelgaas@google.com>
> >> > > wrote:
> >> > > > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas
> >> > > > <bhelgaas@google.com>
> >> > > wrote:
> >> > > >> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas
> >> > > >> <bhelgaas@google.com>
> >> > > wrote:
> >> > > >
> >> > > >>> I played with this a little more and found this:
> >> > > >>>
> >> > > >>> 1) Magma card in z420, connected to chassis containing X540:
> >> > > >>> fails (original report)
> >> > > >>> 2) X540 in z420, Magma card in z420, connected to empty chassis:
> >> > > >>> fails
> >> > > >>> 3) X540 in z420, Magma card in z420 but no cable to chassis:
> >> > > >>> works
> >> > > >
> >> > > > For what it's worth, I tried config 3 again with v3.11-rc6, and
> >> > > > it failed the same way.  I haven't bothered with config 2.
> >> > > > It's not 100% reproducible, but at least it doesn't seem
> >> > > > related to the expansion chassis.
> >> > > >
> >> > > > I attached the logs from config 3 to
> >> > > > https://bugzilla.kernel.org/show_bug.cgi?id=60776
> >> > >
> >> > > Is there anything I can do to help debug this?  Add
> >> > > instrumentation, etc.?  It seems like I'm doing the simplest
> >> > > possible thing -- just writing to the sysfs sriov_num_vfs file to enable
> VFs.
> >> > >
> >> > > I almost think it must be related to my config somehow if nobody
> >> > > else is seeing this, but at the same time, my config also seems
> >> > > the simplest possible, so I don't know what I could be doing that's
> unusual.
> >> > >
> >> > > Bjorn
> >> >
> >> > Hey Bjorn,
> >> >
> >> > I'm may be little confused so bear with me.
> >> >
> >> > Option 1 = (your normal set up), Magma card plugged to chasis, X540
> >> > in
> >> chasis.
> >> > Option 2 = Magma card plugged to chasis, X540 in z420 system.
> >> > Option 3 = Magma card UNplugged from chasis, x540 in z420 system.
> >> >
> >> > Options 1 & 2 - always fail
> >> > Option 3 - sometimes fails (unsure at what rate failure occurs)
> >> >
> >> > Please correct me if I messed any of that up. :)
> >>
> >> Generally correct.  I've seen failures in all three configs, so I'm
> >> only concerned with the simplest for now (config 3, no expansion chassis).
> >>
> >> > Another question I have relates to the lspci output you supplied in
> >> > the
> >> bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run
> >> lspci before you created the VF's?  If so could we see one while the failure
> was occurring?
> >>
> >> That's correct, I collected the lspci output before reproducing the
> >> problem.  I can't easily collect lspci afterwards because the machine
> >> isn't responsive after the problem starts.
> >>
> >> > Also could you download the latest ixgbevf from source forge?
> >> >
> >> > https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
> >> >
> >> > If we add debugging messages it will be easier to patch this driver
> >> > and it
> >> contains our latest validated code base.
> >>
> >> I can do that if it turns out to be necessary.  But John Haller gave
> >> me a good clue off-list:
> >>
> >> John wrote:
> >> > I assume you want the VFs to be instantiated in a VM.  To do this,
> >> > you need to blacklist the ixgbevf driver in the host (or not
> >> > compile it into the host), or it will try to associate the driver
> >> > in the host, rather than in the VM where you want it.  Then, the VM
> >> > needs the ixgbevf driver, which will hopefully do a better job of
> >> > talking to the mailbox in the host.  There is some work to assign
> >> > the VF(s) to the VM, but I don't remember that offhand.
> >>
> >> I don't have any VMs (I started this whole thing because I was
> >> looking at a PCI hotplug issue related to SR-IOV, so I don't really care
> about VMs).
> >>
> >> So the ixgbevf driver on the *host* is claiming the new VFs, and it
> >> sounds like maybe it can't handle that?
> >>
> >> Bjorn
> >
> > Not to speak for John, but I believe he was saying if you want to use your
> VF's in a VM you need to make sure you don't run the ixgbevf driver on the
> host as it will "claim" the VF's.  If you are NOT running any VM's then it is
> perfectly fine to have both ixgbe and ixgbevf loaded.
> 
> OK.  It certainly *seemed* surprising to have the ixgbevf driver blow up,
> even if it was an error on my part to load it in the host.  Just let me know if
> there's any more testing I can do.
> 
> Bjorn

Something is leading to the mbx messages being messed up as event by the " Last Request of type 03 to PF Nacked" messages.   Have you tried reseting the ixgbevf port (ethtool -r <your port>)?  Is it even possible to do this as you mentioned that in the failure state the machine isn't very responsive?

If it might be worthwhile to add logging into the ixgbevf and ixgbe drivers around the mbx messages, with the hope being that it would help show what is going between the two.  There have been some changes in that area of the ixgbevf code as of late, so working off the latest source forge driver would the easiest for me to send you patch on.  Sadly we haven't been able to recreate the failure here so it makes it rather hard to debug.

Thanks,
-Don Skidmore <donald.c.skidmore@intel.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
  2013-08-23 21:41                     ` Skidmore, Donald C
@ 2013-08-27 23:01                       ` Bjorn Helgaas
  -1 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-27 23:01 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

On Fri, Aug 23, 2013 at 3:41 PM, Skidmore, Donald C
<donald.c.skidmore@intel.com> wrote:
>> -----Original Message-----
>> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> Sent: Friday, August 23, 2013 1:43 PM
>> To: Skidmore, Donald C
>> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Don Dutile
>> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
>> to PF Nacked" messages
>>
>> On Fri, Aug 23, 2013 at 2:37 PM, Skidmore, Donald C
>> <donald.c.skidmore@intel.com> wrote:
>> >> -----Original Message-----
>> >> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> >> Sent: Friday, August 23, 2013 11:53 AM
>> >> To: Skidmore, Donald C
>> >> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
>> >> linux- kernel@vger.kernel.org; Don Dutile
>> >> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of
>> >> type 00 to PF Nacked" messages
>> >>
>> >> On Fri, Aug 23, 2013 at 06:25:06PM +0000, Skidmore, Donald C wrote:
>> >> > > -----Original Message-----
>> >> > > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> >> > > Sent: Friday, August 23, 2013 9:53 AM
>> >> > > To: Skidmore, Donald C
>> >> > > Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
>> >> > > linux- kernel@vger.kernel.org; Don Dutile
>> >> > > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last
>> >> > > Request of type 00 to PF Nacked" messages
>> >> > >
>> >> > > On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas
>> >> > > <bhelgaas@google.com>
>> >> > > wrote:
>> >> > > > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas
>> >> > > > <bhelgaas@google.com>
>> >> > > wrote:
>> >> > > >> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas
>> >> > > >> <bhelgaas@google.com>
>> >> > > wrote:
>> >> > > >
>> >> > > >>> I played with this a little more and found this:
>> >> > > >>>
>> >> > > >>> 1) Magma card in z420, connected to chassis containing X540:
>> >> > > >>> fails (original report)
>> >> > > >>> 2) X540 in z420, Magma card in z420, connected to empty chassis:
>> >> > > >>> fails
>> >> > > >>> 3) X540 in z420, Magma card in z420 but no cable to chassis:
>> >> > > >>> works
>> >> > > >
>> >> > > > For what it's worth, I tried config 3 again with v3.11-rc6, and
>> >> > > > it failed the same way.  I haven't bothered with config 2.
>> >> > > > It's not 100% reproducible, but at least it doesn't seem
>> >> > > > related to the expansion chassis.
>> >> > > >
>> >> > > > I attached the logs from config 3 to
>> >> > > > https://bugzilla.kernel.org/show_bug.cgi?id=60776
>> >> > >
>> >> > > Is there anything I can do to help debug this?  Add
>> >> > > instrumentation, etc.?  It seems like I'm doing the simplest
>> >> > > possible thing -- just writing to the sysfs sriov_num_vfs file to enable
>> VFs.
>> >> > >
>> >> > > I almost think it must be related to my config somehow if nobody
>> >> > > else is seeing this, but at the same time, my config also seems
>> >> > > the simplest possible, so I don't know what I could be doing that's
>> unusual.
>> >> > >
>> >> > > Bjorn
>> >> >
>> >> > Hey Bjorn,
>> >> >
>> >> > I'm may be little confused so bear with me.
>> >> >
>> >> > Option 1 = (your normal set up), Magma card plugged to chasis, X540
>> >> > in
>> >> chasis.
>> >> > Option 2 = Magma card plugged to chasis, X540 in z420 system.
>> >> > Option 3 = Magma card UNplugged from chasis, x540 in z420 system.
>> >> >
>> >> > Options 1 & 2 - always fail
>> >> > Option 3 - sometimes fails (unsure at what rate failure occurs)
>> >> >
>> >> > Please correct me if I messed any of that up. :)
>> >>
>> >> Generally correct.  I've seen failures in all three configs, so I'm
>> >> only concerned with the simplest for now (config 3, no expansion chassis).
>> >>
>> >> > Another question I have relates to the lspci output you supplied in
>> >> > the
>> >> bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run
>> >> lspci before you created the VF's?  If so could we see one while the failure
>> was occurring?
>> >>
>> >> That's correct, I collected the lspci output before reproducing the
>> >> problem.  I can't easily collect lspci afterwards because the machine
>> >> isn't responsive after the problem starts.
>> >>
>> >> > Also could you download the latest ixgbevf from source forge?
>> >> >
>> >> > https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
>> >> >
>> >> > If we add debugging messages it will be easier to patch this driver
>> >> > and it
>> >> contains our latest validated code base.
>> >>
>> >> I can do that if it turns out to be necessary.  But John Haller gave
>> >> me a good clue off-list:
>> >>
>> >> John wrote:
>> >> > I assume you want the VFs to be instantiated in a VM.  To do this,
>> >> > you need to blacklist the ixgbevf driver in the host (or not
>> >> > compile it into the host), or it will try to associate the driver
>> >> > in the host, rather than in the VM where you want it.  Then, the VM
>> >> > needs the ixgbevf driver, which will hopefully do a better job of
>> >> > talking to the mailbox in the host.  There is some work to assign
>> >> > the VF(s) to the VM, but I don't remember that offhand.
>> >>
>> >> I don't have any VMs (I started this whole thing because I was
>> >> looking at a PCI hotplug issue related to SR-IOV, so I don't really care
>> about VMs).
>> >>
>> >> So the ixgbevf driver on the *host* is claiming the new VFs, and it
>> >> sounds like maybe it can't handle that?
>> >>
>> >> Bjorn
>> >
>> > Not to speak for John, but I believe he was saying if you want to use your
>> VF's in a VM you need to make sure you don't run the ixgbevf driver on the
>> host as it will "claim" the VF's.  If you are NOT running any VM's then it is
>> perfectly fine to have both ixgbe and ixgbevf loaded.
>>
>> OK.  It certainly *seemed* surprising to have the ixgbevf driver blow up,
>> even if it was an error on my part to load it in the host.  Just let me know if
>> there's any more testing I can do.
>>
>> Bjorn
>
> Something is leading to the mbx messages being messed up as event by the " Last Request of type 03 to PF Nacked" messages.   Have you tried reseting the ixgbevf port (ethtool -r <your port>)?  Is it even possible to do this as you mentioned that in the failure state the machine isn't very responsive?
>
> If it might be worthwhile to add logging into the ixgbevf and ixgbe drivers around the mbx messages, with the hope being that it would help show what is going between the two.  There have been some changes in that area of the ixgbevf code as of late, so working off the latest source forge driver would the easiest for me to send you patch on.  Sadly we haven't been able to recreate the failure here so it makes it rather hard to debug.

I haven't been able to reproduce the problem with the 2.10.3 ixgbevf
driver from http://sourceforge.net/projects/e1000/files/ixgbevf%20stable/

I did notice what looks like a printk format problem and what appears
to be a bare MAC address with no label:

[  316.699504] ixgbevf: eth%d: ixgbevf_init_interrupt_scheme:
Multiqueue Disabled: Rx Queue count = 1, Tx Queue count = 1
[  316.710897] ixgbevf: eth3: ixgbevf_probe: Intel(R) X540 Virtual Function
[  316.717608] 08:88:ff:ff:0d:ec

Sorry for wasting so much time on something that appears to be already fixed.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-08-27 23:01                       ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-08-27 23:01 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

On Fri, Aug 23, 2013 at 3:41 PM, Skidmore, Donald C
<donald.c.skidmore@intel.com> wrote:
>> -----Original Message-----
>> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> Sent: Friday, August 23, 2013 1:43 PM
>> To: Skidmore, Donald C
>> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
>> kernel@vger.kernel.org; Don Dutile
>> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
>> to PF Nacked" messages
>>
>> On Fri, Aug 23, 2013 at 2:37 PM, Skidmore, Donald C
>> <donald.c.skidmore@intel.com> wrote:
>> >> -----Original Message-----
>> >> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> >> Sent: Friday, August 23, 2013 11:53 AM
>> >> To: Skidmore, Donald C
>> >> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
>> >> linux- kernel@vger.kernel.org; Don Dutile
>> >> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of
>> >> type 00 to PF Nacked" messages
>> >>
>> >> On Fri, Aug 23, 2013 at 06:25:06PM +0000, Skidmore, Donald C wrote:
>> >> > > -----Original Message-----
>> >> > > From: Bjorn Helgaas [mailto:bhelgaas@google.com]
>> >> > > Sent: Friday, August 23, 2013 9:53 AM
>> >> > > To: Skidmore, Donald C
>> >> > > Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org;
>> >> > > linux- kernel@vger.kernel.org; Don Dutile
>> >> > > Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last
>> >> > > Request of type 00 to PF Nacked" messages
>> >> > >
>> >> > > On Tue, Aug 20, 2013 at 5:37 PM, Bjorn Helgaas
>> >> > > <bhelgaas@google.com>
>> >> > > wrote:
>> >> > > > On Tue, Aug 20, 2013 at 5:08 PM, Bjorn Helgaas
>> >> > > > <bhelgaas@google.com>
>> >> > > wrote:
>> >> > > >> On Tue, Aug 13, 2013 at 8:23 PM, Bjorn Helgaas
>> >> > > >> <bhelgaas@google.com>
>> >> > > wrote:
>> >> > > >
>> >> > > >>> I played with this a little more and found this:
>> >> > > >>>
>> >> > > >>> 1) Magma card in z420, connected to chassis containing X540:
>> >> > > >>> fails (original report)
>> >> > > >>> 2) X540 in z420, Magma card in z420, connected to empty chassis:
>> >> > > >>> fails
>> >> > > >>> 3) X540 in z420, Magma card in z420 but no cable to chassis:
>> >> > > >>> works
>> >> > > >
>> >> > > > For what it's worth, I tried config 3 again with v3.11-rc6, and
>> >> > > > it failed the same way.  I haven't bothered with config 2.
>> >> > > > It's not 100% reproducible, but at least it doesn't seem
>> >> > > > related to the expansion chassis.
>> >> > > >
>> >> > > > I attached the logs from config 3 to
>> >> > > > https://bugzilla.kernel.org/show_bug.cgi?id=60776
>> >> > >
>> >> > > Is there anything I can do to help debug this?  Add
>> >> > > instrumentation, etc.?  It seems like I'm doing the simplest
>> >> > > possible thing -- just writing to the sysfs sriov_num_vfs file to enable
>> VFs.
>> >> > >
>> >> > > I almost think it must be related to my config somehow if nobody
>> >> > > else is seeing this, but at the same time, my config also seems
>> >> > > the simplest possible, so I don't know what I could be doing that's
>> unusual.
>> >> > >
>> >> > > Bjorn
>> >> >
>> >> > Hey Bjorn,
>> >> >
>> >> > I'm may be little confused so bear with me.
>> >> >
>> >> > Option 1 = (your normal set up), Magma card plugged to chasis, X540
>> >> > in
>> >> chasis.
>> >> > Option 2 = Magma card plugged to chasis, X540 in z420 system.
>> >> > Option 3 = Magma card UNplugged from chasis, x540 in z420 system.
>> >> >
>> >> > Options 1 & 2 - always fail
>> >> > Option 3 - sometimes fails (unsure at what rate failure occurs)
>> >> >
>> >> > Please correct me if I messed any of that up. :)
>> >>
>> >> Generally correct.  I've seen failures in all three configs, so I'm
>> >> only concerned with the simplest for now (config 3, no expansion chassis).
>> >>
>> >> > Another question I have relates to the lspci output you supplied in
>> >> > the
>> >> bugzilla.  I'm not seeing the VF devices (i.e. 08:10.0) did you run
>> >> lspci before you created the VF's?  If so could we see one while the failure
>> was occurring?
>> >>
>> >> That's correct, I collected the lspci output before reproducing the
>> >> problem.  I can't easily collect lspci afterwards because the machine
>> >> isn't responsive after the problem starts.
>> >>
>> >> > Also could you download the latest ixgbevf from source forge?
>> >> >
>> >> > https://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
>> >> >
>> >> > If we add debugging messages it will be easier to patch this driver
>> >> > and it
>> >> contains our latest validated code base.
>> >>
>> >> I can do that if it turns out to be necessary.  But John Haller gave
>> >> me a good clue off-list:
>> >>
>> >> John wrote:
>> >> > I assume you want the VFs to be instantiated in a VM.  To do this,
>> >> > you need to blacklist the ixgbevf driver in the host (or not
>> >> > compile it into the host), or it will try to associate the driver
>> >> > in the host, rather than in the VM where you want it.  Then, the VM
>> >> > needs the ixgbevf driver, which will hopefully do a better job of
>> >> > talking to the mailbox in the host.  There is some work to assign
>> >> > the VF(s) to the VM, but I don't remember that offhand.
>> >>
>> >> I don't have any VMs (I started this whole thing because I was
>> >> looking at a PCI hotplug issue related to SR-IOV, so I don't really care
>> about VMs).
>> >>
>> >> So the ixgbevf driver on the *host* is claiming the new VFs, and it
>> >> sounds like maybe it can't handle that?
>> >>
>> >> Bjorn
>> >
>> > Not to speak for John, but I believe he was saying if you want to use your
>> VF's in a VM you need to make sure you don't run the ixgbevf driver on the
>> host as it will "claim" the VF's.  If you are NOT running any VM's then it is
>> perfectly fine to have both ixgbe and ixgbevf loaded.
>>
>> OK.  It certainly *seemed* surprising to have the ixgbevf driver blow up,
>> even if it was an error on my part to load it in the host.  Just let me know if
>> there's any more testing I can do.
>>
>> Bjorn
>
> Something is leading to the mbx messages being messed up as event by the " Last Request of type 03 to PF Nacked" messages.   Have you tried reseting the ixgbevf port (ethtool -r <your port>)?  Is it even possible to do this as you mentioned that in the failure state the machine isn't very responsive?
>
> If it might be worthwhile to add logging into the ixgbevf and ixgbe drivers around the mbx messages, with the hope being that it would help show what is going between the two.  There have been some changes in that area of the ixgbevf code as of late, so working off the latest source forge driver would the easiest for me to send you patch on.  Sadly we haven't been able to recreate the failure here so it makes it rather hard to debug.

I haven't been able to reproduce the problem with the 2.10.3 ixgbevf
driver from http://sourceforge.net/projects/e1000/files/ixgbevf%20stable/

I did notice what looks like a printk format problem and what appears
to be a bare MAC address with no label:

[  316.699504] ixgbevf: eth%d: ixgbevf_init_interrupt_scheme:
Multiqueue Disabled: Rx Queue count = 1, Tx Queue count = 1
[  316.710897] ixgbevf: eth3: ixgbevf_probe: Intel(R) X540 Virtual Function
[  316.717608] 08:88:ff:ff:0d:ec

Sorry for wasting so much time on something that appears to be already fixed.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
  2013-08-27 23:01                       ` Bjorn Helgaas
@ 2013-09-12 22:26                         ` Bjorn Helgaas
  -1 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-09-12 22:26 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

On Tue, Aug 27, 2013 at 5:01 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:

> I haven't been able to reproduce the problem with the 2.10.3 ixgbevf
> driver from http://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
> ...
> Sorry for wasting so much time on something that appears to be already fixed.

I just tried the brand-new v3.11, and the usual, trivial:

  # echo -n 8 > /sys/bus/pci/devices/0000:04:00.0/sriov_numvfs

was enough to blow up my box the same old boring way (infinite string
of "ixgbevf 0000:04:11.0: Last Request of type 03 to PF Nacked"
messages.

I guess this is because v3.11 still includes the 2.7.12-k ixgbevf
driver, not the apparently-fixed 2.10.3 version from your sourceforge
page.

According to sourceforge, 2.7.12 was released almost a YEAR ago, on
2012-10-18, and 2.10.3 was released 2013-07-26.  Why isn't 2.10.3 in
v3.11?

Don't you guys care that it is so easy to blow up your driver with the
mainline kernel?  I'm quite frustrated by how much time I've wasted on
this issue.

I do not think that defending yourself with "please try the latest
driver from sourceforge" is a reasonable or friendly way to work in
the Linux community.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-09-12 22:26                         ` Bjorn Helgaas
  0 siblings, 0 replies; 28+ messages in thread
From: Bjorn Helgaas @ 2013-09-12 22:26 UTC (permalink / raw)
  To: Skidmore, Donald C; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

On Tue, Aug 27, 2013 at 5:01 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:

> I haven't been able to reproduce the problem with the 2.10.3 ixgbevf
> driver from http://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
> ...
> Sorry for wasting so much time on something that appears to be already fixed.

I just tried the brand-new v3.11, and the usual, trivial:

  # echo -n 8 > /sys/bus/pci/devices/0000:04:00.0/sriov_numvfs

was enough to blow up my box the same old boring way (infinite string
of "ixgbevf 0000:04:11.0: Last Request of type 03 to PF Nacked"
messages.

I guess this is because v3.11 still includes the 2.7.12-k ixgbevf
driver, not the apparently-fixed 2.10.3 version from your sourceforge
page.

According to sourceforge, 2.7.12 was released almost a YEAR ago, on
2012-10-18, and 2.10.3 was released 2013-07-26.  Why isn't 2.10.3 in
v3.11?

Don't you guys care that it is so easy to blow up your driver with the
mainline kernel?  I'm quite frustrated by how much time I've wasted on
this issue.

I do not think that defending yourself with "please try the latest
driver from sourceforge" is a reasonable or friendly way to work in
the Linux community.

Bjorn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
  2013-09-12 22:26                         ` Bjorn Helgaas
@ 2013-09-13  0:18                           ` Skidmore, Donald C
  -1 siblings, 0 replies; 28+ messages in thread
From: Skidmore, Donald C @ 2013-09-13  0:18 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Thursday, September 12, 2013 3:27 PM
> To: Skidmore, Donald C
> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org; Don Dutile
> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
> to PF Nacked" messages
> 
> On Tue, Aug 27, 2013 at 5:01 PM, Bjorn Helgaas <bhelgaas@google.com>
> wrote:
> 
> > I haven't been able to reproduce the problem with the 2.10.3 ixgbevf
> > driver from
> > http://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
> > ...
> > Sorry for wasting so much time on something that appears to be already
> fixed.
> 
> I just tried the brand-new v3.11, and the usual, trivial:
> 
>   # echo -n 8 > /sys/bus/pci/devices/0000:04:00.0/sriov_numvfs
> 
> was enough to blow up my box the same old boring way (infinite string of
> "ixgbevf 0000:04:11.0: Last Request of type 03 to PF Nacked"
> messages.
> 
> I guess this is because v3.11 still includes the 2.7.12-k ixgbevf driver, not the
> apparently-fixed 2.10.3 version from your sourceforge page.
> 
> According to sourceforge, 2.7.12 was released almost a YEAR ago, on 2012-10-
> 18, and 2.10.3 was released 2013-07-26.  Why isn't 2.10.3 in v3.11?
> 
> Don't you guys care that it is so easy to blow up your driver with the mainline
> kernel?  I'm quite frustrated by how much time I've wasted on this issue.
> 
> I do not think that defending yourself with "please try the latest driver from
> sourceforge" is a reasonable or friendly way to work in the Linux community.
> 
> Bjorn

Your right I haven't been keeping the version strings up-to-date with our latest upstream pushes.  I was hoping to reach a sync point were both drivers (upstream and out of tree) were closer before I bumped the upstream version.  The relationship between the version number in the ixgbevf version number in the upstream kernel and our out of tree driver are not quite that straight forward as the version strings would suggest.  We have pushed quite a few patches since the last version bump a year ago and in fact attempt to push patches upstream in parallel with any changes we make in the out of tree driver.   But depending on quite a list of events (testing, release schedule, when net-next is open) one driver can receive patches earlier and or later than another.  Also ixgbevf is currently going through a fair amount of refactoring to bring in more up to date with ixgbe so there are a fair amount of patches currently in play. The reason I suggested you try the out of tree driver (source forge) as I knew it was currently a bit more up to date. 

The good news is if the latest out of tree driver is correcting your problem the fix most likely is reroute to upstream.  Likewise I can send you some of the upstream patches that are in the out of tree driver but are waiting to be sent upstream, if you would like to try them.  Some of them touch code around the mbx messages, which like I mentioned in an early email the error message your seeing seems to imply something has gone wrong there.  But since we can't seem to recreate your failure local I can't know for sure.

Thanks,
-Don Skidmore <donald.c.skidmore@intel.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages
@ 2013-09-13  0:18                           ` Skidmore, Donald C
  0 siblings, 0 replies; 28+ messages in thread
From: Skidmore, Donald C @ 2013-09-13  0:18 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: e1000-devel, linux-pci, linux-kernel, Don Dutile

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@google.com]
> Sent: Thursday, September 12, 2013 3:27 PM
> To: Skidmore, Donald C
> Cc: e1000-devel@lists.sourceforge.net; linux-pci@vger.kernel.org; linux-
> kernel@vger.kernel.org; Don Dutile
> Subject: Re: [E1000-devel] 3.11-rc4 ixgbevf: endless "Last Request of type 00
> to PF Nacked" messages
> 
> On Tue, Aug 27, 2013 at 5:01 PM, Bjorn Helgaas <bhelgaas@google.com>
> wrote:
> 
> > I haven't been able to reproduce the problem with the 2.10.3 ixgbevf
> > driver from
> > http://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
> > ...
> > Sorry for wasting so much time on something that appears to be already
> fixed.
> 
> I just tried the brand-new v3.11, and the usual, trivial:
> 
>   # echo -n 8 > /sys/bus/pci/devices/0000:04:00.0/sriov_numvfs
> 
> was enough to blow up my box the same old boring way (infinite string of
> "ixgbevf 0000:04:11.0: Last Request of type 03 to PF Nacked"
> messages.
> 
> I guess this is because v3.11 still includes the 2.7.12-k ixgbevf driver, not the
> apparently-fixed 2.10.3 version from your sourceforge page.
> 
> According to sourceforge, 2.7.12 was released almost a YEAR ago, on 2012-10-
> 18, and 2.10.3 was released 2013-07-26.  Why isn't 2.10.3 in v3.11?
> 
> Don't you guys care that it is so easy to blow up your driver with the mainline
> kernel?  I'm quite frustrated by how much time I've wasted on this issue.
> 
> I do not think that defending yourself with "please try the latest driver from
> sourceforge" is a reasonable or friendly way to work in the Linux community.
> 
> Bjorn

Your right I haven't been keeping the version strings up-to-date with our latest upstream pushes.  I was hoping to reach a sync point were both drivers (upstream and out of tree) were closer before I bumped the upstream version.  The relationship between the version number in the ixgbevf version number in the upstream kernel and our out of tree driver are not quite that straight forward as the version strings would suggest.  We have pushed quite a few patches since the last version bump a year ago and in fact attempt to push patches upstream in parallel with any changes we make in the out of tree driver.   But depending on quite a list of events (testing, release schedule, when net-next is open) one driver can receive patches earlier and or later than another.  Also ixgbevf is currently going through a fair amount of refactoring to bring in more up to date with ixgbe so there are a fair amount of patches currently in play. The reason I suggested you try the out of tree driver (source forge) as I knew it was currently a bit more up to date. 

The good news is if the latest out of tree driver is correcting your problem the fix most likely is reroute to upstream.  Likewise I can send you some of the upstream patches that are in the out of tree driver but are waiting to be sent upstream, if you would like to try them.  Some of them touch code around the mbx messages, which like I mentioned in an early email the error message your seeing seems to imply something has gone wrong there.  But since we can't seem to recreate your failure local I can't know for sure.

Thanks,
-Don Skidmore <donald.c.skidmore@intel.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-09-13  0:18 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-09 17:18 3.11-rc4 ixgbevf: endless "Last Request of type 00 to PF Nacked" messages Bjorn Helgaas
2013-08-09 17:18 ` Bjorn Helgaas
2013-08-13 21:54 ` [E1000-devel] " Skidmore, Donald C
2013-08-13 21:54   ` Skidmore, Donald C
2013-08-14  2:23   ` Bjorn Helgaas
2013-08-14  2:23     ` Bjorn Helgaas
2013-08-20 23:08     ` Bjorn Helgaas
2013-08-20 23:08       ` Bjorn Helgaas
2013-08-20 23:37       ` Bjorn Helgaas
2013-08-20 23:37         ` Bjorn Helgaas
2013-08-23 16:52         ` Bjorn Helgaas
2013-08-23 16:52           ` Bjorn Helgaas
2013-08-23 18:25           ` Skidmore, Donald C
2013-08-23 18:25             ` Skidmore, Donald C
2013-08-23 18:52             ` Bjorn Helgaas
2013-08-23 18:52               ` Bjorn Helgaas
2013-08-23 20:37               ` Skidmore, Donald C
2013-08-23 20:37                 ` Skidmore, Donald C
2013-08-23 20:42                 ` Bjorn Helgaas
2013-08-23 20:42                   ` Bjorn Helgaas
2013-08-23 21:41                   ` Skidmore, Donald C
2013-08-23 21:41                     ` Skidmore, Donald C
2013-08-27 23:01                     ` Bjorn Helgaas
2013-08-27 23:01                       ` Bjorn Helgaas
2013-09-12 22:26                       ` Bjorn Helgaas
2013-09-12 22:26                         ` Bjorn Helgaas
2013-09-13  0:18                         ` Skidmore, Donald C
2013-09-13  0:18                           ` Skidmore, Donald C

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.