* igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer @ 2014-01-30 19:08 Zoltan Kiss 2014-01-30 20:34 ` Michael Chan ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Zoltan Kiss @ 2014-01-30 19:08 UTC (permalink / raw) To: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny, Don Skidmore, Greg Rose, Peter P Waskiewicz Jr, Alex Duyck, John Ronciak, Tushar Dave, Akeem G Abodunrin, David S. Miller, e1000-devel, netdev, linux-kernel, Michael Chan, xen-devel Hi, I've experienced some queue timeout problems mentioned in the subject with igb and bnx2 cards. I haven't seen them on other cards so far. I'm using XenServer with 3.10 Dom0 kernel (however igb were already updated to latest version), and there are Windows guests sending data through these cards. I noticed these problems in XenRT test runs, and I know that they usually mean some lost interrupt problem or other hardware error, but in my case they started to appear more often, and they are likely connected to my netback grant mapping patches. These patches causing skb's with huge (~64kb) linear buffers to appear more often. The reason for that is an old problem in the ring protocol: originally the maximum amount of slots were linked to MAX_SKB_FRAGS, as every slot ended up as a frag of the skb. When this value were changed, netback had to cope with the situation by coalescing the packets into fewer frags. My patch series take a different approach: the leftover slots (pages) were assigned to a new skb's frags, and that skb were stashed to the frag_list of the first one. Then, before sending it off to the stack it calls skb = skb_copy_expand(skb, 0, 0, GFP_ATOMIC, __GFP_NOWARN), which basically creates a new skb and copied all the data into it. As far as I understood, it put everything into the linear buffer, which can amount to 64KB at most. The original skb are freed then, and this new one were sent to the stack. I suspect that this is the problem as it only happens when guests send too much slots. Does anyone familiar with these drivers have seen such issue before? (when these kind of skb's get stucked in the queue) Regards, Zoltan Kiss ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer 2014-01-30 19:08 igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer Zoltan Kiss @ 2014-01-30 20:34 ` Michael Chan 2014-01-31 13:29 ` Zoltan Kiss 2014-01-31 18:56 ` Wei Liu 2014-02-12 17:13 ` Zoltan Kiss 2 siblings, 1 reply; 11+ messages in thread From: Michael Chan @ 2014-01-30 20:34 UTC (permalink / raw) To: Zoltan Kiss Cc: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny, Don Skidmore, Greg Rose, Peter P Waskiewicz Jr, Alex Duyck, John Ronciak, Tushar Dave, Akeem G Abodunrin, David S. Miller, e1000-devel, netdev, linux-kernel, xen-devel On Thu, 2014-01-30 at 19:08 +0000, Zoltan Kiss wrote: > I've experienced some queue timeout problems mentioned in the subject > with igb and bnx2 cards. Please provide the full tx timeout dmesg. bnx2 dumps some diagnostic information during tx timeout that may be useful. Thanks. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer 2014-01-30 20:34 ` Michael Chan @ 2014-01-31 13:29 ` Zoltan Kiss 2014-02-04 19:47 ` Michael Chan 0 siblings, 1 reply; 11+ messages in thread From: Zoltan Kiss @ 2014-01-31 13:29 UTC (permalink / raw) To: Michael Chan, Zoltan Kiss Cc: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny, Don Skidmore, Greg Rose, Peter P Waskiewicz Jr, Alex Duyck, John Ronciak, Tushar Dave, Akeem G Abodunrin, David S. Miller, e1000-devel, netdev, linux-kernel, xen-devel On 30/01/14 21:34, Michael Chan wrote: > On Thu, 2014-01-30 at 19:08 +0000, Zoltan Kiss wrote: >> I've experienced some queue timeout problems mentioned in the subject >> with igb and bnx2 cards. > Please provide the full tx timeout dmesg. bnx2 dumps some diagnostic > information during tx timeout that may be useful. Thanks. Hi, Here is some: [ 5417.275463] ------------[ cut here ]------------ [ 5417.275472] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x156/0x1f0() [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out [ 5417.275476] Modules linked in: tun nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 nfs fscache lockd sunrpc ipv6 openvswitch(O) ipt_REJECT nf_conntrack_ipv rack xt_tcpudp iptable_filter ip_tables x_tables nls_utf8 isofs dm_multipath scsi_dh dm_mod dcdbas coretemp microcode psmouse serio_raw lpc_ich mfd_core hid_generic ehci_p sg hed bnx2 usbhid hid sr_mod cdrom sd_mod pata_acpi ata_generic ata_piix libata uhci_hcd mptsas mptscsih mptbase scsi_transport_sas scsi_mod [ 5417.275517] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G O 3.10.11-0.xs1.8.50.170.377582 #1 [ 5417.275518] Hardware name: Dell Inc. PowerEdge R710/00W9X3, BIOS 1.2.6 07/17/2009 [ 5417.275520] 000000ff f008be08 c1488c53 f008be30 c1046664 c1658a88 f008be5c 000000ff [ 5417.275525] c13fc146 c13fc146 ee96a000 00000002 00137d44 f008be48 c1046723 00000009 [ 5417.275530] f008be40 c1658a88 f008be5c f008be80 c13fc146 c16556e1 000000ff c1658a88 [ 5417.275535] Call Trace: [ 5417.275539] [<c1488c53>] dump_stack+0x16/0x1b [ 5417.275544] [<c1046664>] warn_slowpath_common+0x64/0x80 [ 5417.275546] [<c13fc146>] ? dev_watchdog+0x156/0x1f0 [ 5417.275549] [<c13fc146>] ? dev_watchdog+0x156/0x1f0 [ 5417.275551] [<c1046723>] warn_slowpath_fmt+0x33/0x40 [ 5417.275554] [<c13fc146>] dev_watchdog+0x156/0x1f0 [ 5417.275559] [<c10549ce>] call_timer_fn+0x3e/0xf0 [ 5417.275563] [<c107293e>] ? finish_task_switch+0x4e/0xb0 [ 5417.275565] [<c13fbff0>] ? __netdev_watchdog_up+0x60/0x60 [ 5417.275568] [<c1055c1b>] run_timer_softirq+0x1ab/0x210 [ 5417.275571] [<c13fbff0>] ? __netdev_watchdog_up+0x60/0x60 [ 5417.275574] [<c104e3f4>] __do_softirq+0xc4/0x200 [ 5417.275577] [<c1493547>] ? xen_do_upcall+0x7/0xc [ 5417.275579] [<c104e550>] run_ksoftirqd+0x20/0x50 [ 5417.275582] [<c106f182>] smpboot_thread_fn+0x142/0x150 [ 5417.275586] [<c1067a2b>] kthread+0x9b/0xa0 [ 5417.275589] [<c106f040>] ? smpboot_create_threads+0x60/0x60 [ 5417.275591] [<c1070000>] ? cpu_rt_runtime_read+0x40/0x80 [ 5417.275594] [<c1492f77>] ret_from_kernel_thread+0x1b/0x28 [ 5417.275596] [<c1067990>] ? kthread_freezable_should_stop+0x60/0x60 [ 5417.275599] ---[ end trace 691f572d388226ca ]--- [ 5417.275602] bnx2 0000:01:00.1 eth1: <--- start FTQ dump ---> [ 5417.275622] bnx2 0000:01:00.1 eth1: RV2P_PFTQ_CTL 00010000 [ 5417.275629] bnx2 0000:01:00.1 eth1: RV2P_TFTQ_CTL 00020000 [ 5417.275636] bnx2 0000:01:00.1 eth1: RV2P_MFTQ_CTL 00004000 [ 5417.275643] bnx2 0000:01:00.1 eth1: TBDR_FTQ_CTL 00004002 [ 5417.275650] bnx2 0000:01:00.1 eth1: TDMA_FTQ_CTL 00010002 [ 5417.275657] bnx2 0000:01:00.1 eth1: TXP_FTQ_CTL 00010000 [ 5417.275663] bnx2 0000:01:00.1 eth1: TXP_FTQ_CTL 00010000 [ 5417.275670] bnx2 0000:01:00.1 eth1: TPAT_FTQ_CTL 00010000 [ 5417.275677] bnx2 0000:01:00.1 eth1: RXP_CFTQ_CTL 00008000 [ 5417.275684] bnx2 0000:01:00.1 eth1: RXP_FTQ_CTL 00100000 [ 5417.275690] bnx2 0000:01:00.1 eth1: COM_COMXQ_FTQ_CTL 00010000 [ 5417.275698] bnx2 0000:01:00.1 eth1: COM_COMTQ_FTQ_CTL 00020000 [ 5417.275705] bnx2 0000:01:00.1 eth1: COM_COMQ_FTQ_CTL 00010000 [ 5417.275712] bnx2 0000:01:00.1 eth1: CP_CPQ_FTQ_CTL 00004000 [ 5417.275718] bnx2 0000:01:00.1 eth1: CPU states: [ 5417.275730] bnx2 0000:01:00.1 eth1: 045000 mode b84c state 80001000 evt_mask 500 pc 8001284 pc 8001284 instr 1440fffc [ 5417.275746] bnx2 0000:01:00.1 eth1: 085000 mode b84c state 80005000 evt_mask 500 pc 8000a54 pc 8000a5c instr 10400016 [ 5417.275785] bnx2 0000:01:00.1 eth1: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c20 pc 8004c20 instr 32050003 [ 5417.275801] bnx2 0000:01:00.1 eth1: 105000 mode b8cc state 80000000 evt_mask 500 pc 8000a8c pc 8000a94 instr 8c420020 [ 5417.275817] bnx2 0000:01:00.1 eth1: 145000 mode b880 state 80000000 evt_mask 500 pc 8000ab0 pc 800d1e8 instr 27bd0020 [ 5417.275834] bnx2 0000:01:00.1 eth1: 185000 mode b8cc state 80000000 evt_mask 500 pc 8000cb0 pc 8000930 instr 8ce800e8 [ 5417.275845] bnx2 0000:01:00.1 eth1: <--- end FTQ dump ---> [ 5417.275851] bnx2 0000:01:00.1 eth1: <--- start TBDC dump ---> [ 5417.275858] bnx2 0000:01:00.1 eth1: TBDC free cnt: 32 [ 5417.275864] bnx2 0000:01:00.1 eth1: LINE CID BIDX CMD VALIDS [ 5417.275875] bnx2 0000:01:00.1 eth1: 00 001080 17c8 00 [0] [ 5417.275886] bnx2 0000:01:00.1 eth1: 01 001080 17e0 00 [0] [ 5417.275897] bnx2 0000:01:00.1 eth1: 02 001080 17e8 00 [0] [ 5417.275907] bnx2 0000:01:00.1 eth1: 03 001080 17f8 00 [0] [ 5417.275918] bnx2 0000:01:00.1 eth1: 04 001080 1800 00 [0] [ 5417.275929] bnx2 0000:01:00.1 eth1: 05 001080 17d0 00 [0] [ 5417.275940] bnx2 0000:01:00.1 eth1: 06 001080 17d8 00 [0] [ 5417.275951] bnx2 0000:01:00.1 eth1: 07 001080 17f0 00 [0] [ 5417.275961] bnx2 0000:01:00.1 eth1: 08 001080 1620 00 [0] [ 5417.275972] bnx2 0000:01:00.1 eth1: 09 17de00 fbf8 78 [0] [ 5417.275983] bnx2 0000:01:00.1 eth1: 0a 1bbf80 fef8 9f [0] [ 5417.275994] bnx2 0000:01:00.1 eth1: 0b 1d2d80 f7f8 7f [0] [ 5417.276005] bnx2 0000:01:00.1 eth1: 0c 148f00 f7b8 88 [0] [ 5417.276016] bnx2 0000:01:00.1 eth1: 0d 16af80 f7d0 75 [0] [ 5417.276026] bnx2 0000:01:00.1 eth1: 0e 1adf80 bfb0 26 [0] [ 5417.276037] bnx2 0000:01:00.1 eth1: 0f 1ebf80 dd68 3c [0] [ 5417.276048] bnx2 0000:01:00.1 eth1: 10 1cf700 d1f0 fc [0] [ 5417.276059] bnx2 0000:01:00.1 eth1: 11 1cdc00 fbf0 7d [0] [ 5417.276069] bnx2 0000:01:00.1 eth1: 12 15c900 f7f8 ef [0] [ 5417.276081] bnx2 0000:01:00.1 eth1: 13 17cf00 d7d8 3f [0] [ 5417.276093] bnx2 0000:01:00.1 eth1: 14 1ecf80 ffb0 b7 [0] [ 5417.276107] bnx2 0000:01:00.1 eth1: 15 1cbd80 f3e8 bf [0] [ 5417.276119] bnx2 0000:01:00.1 eth1: 16 179b80 d7f8 d7 [0] [ 5417.276130] bnx2 0000:01:00.1 eth1: 17 1fdf00 f3e8 7e [0] [ 5417.276141] bnx2 0000:01:00.1 eth1: 18 1f9780 b578 af [0] [ 5417.276152] bnx2 0000:01:00.1 eth1: 19 1d7d80 fef0 ff [0] [ 5417.276163] bnx2 0000:01:00.1 eth1: 1a 1d9e80 5fe8 d7 [0] [ 5417.276174] bnx2 0000:01:00.1 eth1: 1b 1fff80 ebf8 f8 [0] [ 5417.276186] bnx2 0000:01:00.1 eth1: 1c 1fbd80 f7d8 7f [0] [ 5417.276200] bnx2 0000:01:00.1 eth1: 1d 16da80 2ef8 ff [0] [ 5417.276211] bnx2 0000:01:00.1 eth1: 1e 1f9b80 bf50 8e [0] [ 5417.276224] bnx2 0000:01:00.1 eth1: 1f 1bdf00 faf8 75 [0] [ 5417.276231] bnx2 0000:01:00.1 eth1: <--- end TBDC dump ---> [ 5417.276246] bnx2 0000:01:00.1 eth1: DEBUG: intr_sem[0] PCI_CMD[00100406] [ 5417.276258] bnx2 0000:01:00.1 eth1: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088] [ 5417.276269] bnx2 0000:01:00.1 eth1: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] [ 5417.276280] bnx2 0000:01:00.1 eth1: DEBUG: RPM_MGMT_PKT_CTRL[40000088] [ 5417.276288] bnx2 0000:01:00.1 eth1: DEBUG: HC_STATS_INTERRUPT_STATUS[01fb0004] [ 5417.276298] bnx2 0000:01:00.1 eth1: DEBUG: PBA[00000000] [ 5417.276304] bnx2 0000:01:00.1 eth1: <--- start MCP states dump ---> [ 5417.276314] bnx2 0000:01:00.1 eth1: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] [ 5417.276326] bnx2 0000:01:00.1 eth1: DEBUG: MCP mode[0000b880] state[80000000] evt_mask[00000500] [ 5417.276339] bnx2 0000:01:00.1 eth1: DEBUG: pc[0800d7b8] pc[08000cdc] instr[00041880] [ 5417.276349] bnx2 0000:01:00.1 eth1: DEBUG: shmem states: [ 5417.276358] bnx2 0000:01:00.1 eth1: DEBUG: drv_mb[0d000004] fw_mb[00000004] link_status[0000006f] [ 5417.276369] drv_pulse_mb[00001485] [ 5417.276373] bnx2 0000:01:00.1 eth1: DEBUG: dev_info_signature[44564903] reset_type[01005254] [ 5417.276383] condition[0003610e] [ 5417.276389] bnx2 0000:01:00.1 eth1: DEBUG: 000001c0: 01005254 42530000 0003610e 00000000 [ 5417.276402] bnx2 0000:01:00.1 eth1: DEBUG: 000003cc: 44444444 44444444 44444444 00000a28 [ 5417.276416] bnx2 0000:01:00.1 eth1: DEBUG: 000003dc: 0004ffff 00000000 00000000 00000000 [ 5417.276430] bnx2 0000:01:00.1 eth1: DEBUG: 000003ec: 00000000 00000000 00000000 00000000 [ 5417.276440] bnx2 0000:01:00.1 eth1: DEBUG: 0x3fc[0000ffff] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer 2014-01-31 13:29 ` Zoltan Kiss @ 2014-02-04 19:47 ` Michael Chan 2014-02-05 20:23 ` Zoltan Kiss 0 siblings, 1 reply; 11+ messages in thread From: Michael Chan @ 2014-02-04 19:47 UTC (permalink / raw) To: Zoltan Kiss Cc: Zoltan Kiss, Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny, Don Skidmore, Greg Rose, Peter P Waskiewicz Jr, Alex Duyck, John Ronciak, Tushar Dave, Akeem G Abodunrin, David S. Miller, e1000-devel, netdev, linux-kernel, xen-devel On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote: > [ 5417.275472] WARNING: at net/sched/sch_generic.c:255 > dev_watchdog+0x156/0x1f0() > [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out The dump shows an internal IRQ pending on MSIX vector 2 which matches the the queue number that is timing out. I don't know what happened to the MSIX and why the driver is not seeing it. Do you see an IRQ error message from the kernel a few seconds before the tx timeout message? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer 2014-02-04 19:47 ` Michael Chan @ 2014-02-05 20:23 ` Zoltan Kiss 2014-02-05 20:27 ` Zoltan Kiss 2014-02-05 20:43 ` Andrew Cooper 0 siblings, 2 replies; 11+ messages in thread From: Zoltan Kiss @ 2014-02-05 20:23 UTC (permalink / raw) To: Michael Chan Cc: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny, Don Skidmore, Greg Rose, Peter P Waskiewicz Jr, Alex Duyck, John Ronciak, Tushar Dave, Akeem G Abodunrin, David S. Miller, e1000-devel, netdev, linux-kernel, xen-devel, Andrew Cooper On 04/02/14 19:47, Michael Chan wrote: > On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote: >> [ 5417.275472] WARNING: at net/sched/sch_generic.c:255 >> dev_watchdog+0x156/0x1f0() >> [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out > > The dump shows an internal IRQ pending on MSIX vector 2 which matches > the the queue number that is timing out. I don't know what happened to > the MSIX and why the driver is not seeing it. Do you see an IRQ error > message from the kernel a few seconds before the tx timeout message? I haven't seen any IRQ related error message. Note, this is on Xen 4.3.1. Now I have new results with a reworked version of the patch, unfortunately it still has this issue. Here is a bnx2 dump, lspci output and some Xen debug output (MSI and interrupt bindings, I have more if needed). [82099.288743] bnx2 0000:02:00.0 eth0: <--- start FTQ dump ---> [82099.288767] bnx2 0000:02:00.0 eth0: RV2P_PFTQ_CTL 00010002 [82099.288779] bnx2 0000:02:00.0 eth0: RV2P_TFTQ_CTL 00020000 [82099.288790] bnx2 0000:02:00.0 eth0: RV2P_MFTQ_CTL 00004000 [82099.288801] bnx2 0000:02:00.0 eth0: TBDR_FTQ_CTL 00404002 [82099.288812] bnx2 0000:02:00.0 eth0: TDMA_FTQ_CTL 00010002 [82099.288823] bnx2 0000:02:00.0 eth0: TXP_FTQ_CTL 00810002 [82099.288834] bnx2 0000:02:00.0 eth0: TXP_FTQ_CTL 01010002 [82099.288845] bnx2 0000:02:00.0 eth0: TPAT_FTQ_CTL 00010002 [82099.288878] bnx2 0000:02:00.0 eth0: RXP_CFTQ_CTL 00008000 [82099.288889] bnx2 0000:02:00.0 eth0: RXP_FTQ_CTL 00100002 [82099.288899] bnx2 0000:02:00.0 eth0: COM_COMXQ_FTQ_CTL 00010000 [82099.288911] bnx2 0000:02:00.0 eth0: COM_COMTQ_FTQ_CTL 00020000 [82099.288923] bnx2 0000:02:00.0 eth0: COM_COMQ_FTQ_CTL 00010000 [82099.288934] bnx2 0000:02:00.0 eth0: CP_CPQ_FTQ_CTL 00004000 [82099.288944] bnx2 0000:02:00.0 eth0: CPU states: [82099.288960] bnx2 0000:02:00.0 eth0: 045000 mode b84c state 80005000 evt_mask 500 pc 8001284 pc 8000cb8 instr 35690100 [82099.288984] bnx2 0000:02:00.0 eth0: 085000 mode b84c state 80001000 evt_mask 500 pc 8000a58 pc 8000a4c instr 38420001 [82099.289007] bnx2 0000:02:00.0 eth0: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c14 pc 8004c14 instr 32050003 [82099.289030] bnx2 0000:02:00.0 eth0: 105000 mode b8cc state 80000000 evt_mask 500 pc 8000a94 pc 8000a94 instr 8c420020 [82099.289063] bnx2 0000:02:00.0 eth0: 145000 mode b880 state 80000000 evt_mask 500 pc 800d244 pc 8008aac instr 8c460000 [82099.289087] bnx2 0000:02:00.0 eth0: 185000 mode b8cc state 80000000 evt_mask 500 pc 8000c6c pc 8000c6c instr 3c056000 [82099.289103] bnx2 0000:02:00.0 eth0: <--- end FTQ dump ---> [82099.289112] bnx2 0000:02:00.0 eth0: <--- start TBDC dump ---> [82099.289124] bnx2 0000:02:00.0 eth0: TBDC free cnt: 31 [82099.289133] bnx2 0000:02:00.0 eth0: LINE CID BIDX CMD VALIDS [82099.289148] bnx2 0000:02:00.0 eth0: 00 000800 a3b8 00 [1] [82099.289163] bnx2 0000:02:00.0 eth0: 01 001100 1b58 00 [0] [82099.289178] bnx2 0000:02:00.0 eth0: 02 000800 a390 00 [0] [82099.289193] bnx2 0000:02:00.0 eth0: 03 000800 a370 00 [0] [82099.289217] bnx2 0000:02:00.0 eth0: 04 000800 a378 00 [0] [82099.289232] bnx2 0000:02:00.0 eth0: 05 000800 a388 00 [0] [82099.289247] bnx2 0000:02:00.0 eth0: 06 000800 a398 00 [0] [82099.289262] bnx2 0000:02:00.0 eth0: 07 000800 a3a8 00 [0] [82099.289277] bnx2 0000:02:00.0 eth0: 08 000800 a3b0 00 [0] [82099.289291] bnx2 0000:02:00.0 eth0: 09 000800 a3b8 00 [0] [82099.289306] bnx2 0000:02:00.0 eth0: 0a 000800 8c10 00 [0] [82099.289321] bnx2 0000:02:00.0 eth0: 0b 000800 eaf0 00 [0] [82099.289336] bnx2 0000:02:00.0 eth0: 0c 000800 eaf8 00 [0] [82099.289351] bnx2 0000:02:00.0 eth0: 0d 001100 5e60 00 [0] [82099.289365] bnx2 0000:02:00.0 eth0: 0e 001100 5e68 00 [0] [82099.289380] bnx2 0000:02:00.0 eth0: 0f 001100 5e70 00 [0] [82099.289395] bnx2 0000:02:00.0 eth0: 10 001100 5e88 00 [0] [82099.289410] bnx2 0000:02:00.0 eth0: 11 001100 5e90 00 [0] [82099.289425] bnx2 0000:02:00.0 eth0: 12 001100 5ee8 00 [0] [82099.289440] bnx2 0000:02:00.0 eth0: 13 001100 5ef8 00 [0] [82099.289454] bnx2 0000:02:00.0 eth0: 14 001100 5e00 00 [0] [82099.289470] bnx2 0000:02:00.0 eth0: 15 001100 5a20 00 [0] [82099.289485] bnx2 0000:02:00.0 eth0: 16 001100 59a8 00 [0] [82099.289499] bnx2 0000:02:00.0 eth0: 17 001100 59b0 00 [0] [82099.289514] bnx2 0000:02:00.0 eth0: 18 001100 59b8 00 [0] [82099.289529] bnx2 0000:02:00.0 eth0: 19 001100 5a28 00 [0] [82099.289544] bnx2 0000:02:00.0 eth0: 1a 001100 5a30 00 [0] [82099.289559] bnx2 0000:02:00.0 eth0: 1b 000800 8c58 00 [0] [82099.289573] bnx2 0000:02:00.0 eth0: 1c 000800 8c60 00 [0] [82099.289588] bnx2 0000:02:00.0 eth0: 1d 055e80 dca8 fb [0] [82099.289603] bnx2 0000:02:00.0 eth0: 1e 1cf780 f7b8 af [0] [82099.289618] bnx2 0000:02:00.0 eth0: 1f 1dff80 efe0 bf [0] [82099.289629] bnx2 0000:02:00.0 eth0: <--- end TBDC dump ---> [82099.289644] bnx2 0000:02:00.0 eth0: DEBUG: intr_sem[0] PCI_CMD[00100406] [82099.289661] bnx2 0000:02:00.0 eth0: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088] [82099.289676] bnx2 0000:02:00.0 eth0: DEBUG: EMAC_TX_STATUS[0000000e] EMAC_RX_STATUS[00000000] [82099.289691] bnx2 0000:02:00.0 eth0: DEBUG: RPM_MGMT_PKT_CTRL[40000088] [82099.289703] bnx2 0000:02:00.0 eth0: DEBUG: HC_STATS_INTERRUPT_STATUS[01ee0000] [82099.289716] bnx2 0000:02:00.0 eth0: DEBUG: PBA[00000000] [82099.289726] bnx2 0000:02:00.0 eth0: <--- start MCP states dump ---> [82099.289739] bnx2 0000:02:00.0 eth0: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] [82099.289756] bnx2 0000:02:00.0 eth0: DEBUG: MCP mode[0000b880] state[80008000] evt_mask[00000500] [82099.289773] bnx2 0000:02:00.0 eth0: DEBUG: pc[0800b110] pc[0800aff0] instr[afbf0048] [82099.289787] bnx2 0000:02:00.0 eth0: DEBUG: shmem states: [82099.289800] bnx2 0000:02:00.0 eth0: DEBUG: drv_mb[0d000005] fw_mb[00000005] link_status[0010026f] [82099.289820] bnx2 0000:02:00.0 eth0: DEBUG: dev_info_signature[44564903] reset_type[01005254] [82099.289842] bnx2 0000:02:00.0 eth0: DEBUG: 000001c0: 01005254 42530088 0003610e 00000000 [82099.289860] bnx2 0000:02:00.0 eth0: DEBUG: 000003cc: 44444444 44444444 44444444 00000a28 [82099.289879] bnx2 0000:02:00.0 eth0: DEBUG: 000003dc: 000cffff 00000000 ffff0000 00000000 [82099.289897] bnx2 0000:02:00.0 eth0: DEBUG: 000003ec: 00000000 00000000 00000000 00000000 [82099.289911] bnx2 0000:02:00.0 eth0: DEBUG: 0x3fc[0000ffff] [82099.289921] bnx2 0000:02:00.0 eth0: <--- end MCP states dump ---> lspci -s 02:00.0 -vv 02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet (rev 20) Subsystem: Dell Device 045f Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 32 Region 0: Memory at c8000000 (64-bit, non-prefetchable) [size=32M] Expansion ROM at c6000000 [disabled] [size=128K] Capabilities: [48] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data Product Name: Broadcom NetXtreme II Ethernet Controller Read-only fields: [PN] Part number: BCM95709C0 [EC] Engineering changes: 220197-2 [SN] Serial number: 0123456789 [MN] Manufacture ID: 31 30 32 38 [V0] Vendor specific: 6.2.14 [RV] Reserved: checksum good, 22 byte(s) reserved End Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [a0] MSI-X: Enable+ Count=9 Masked- Vector table: BAR=0 offset=0000c000 PBA: BAR=0 offset=0000e000 Capabilities: [ac] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr+ NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 <2us, L1 <2us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Device Serial Number b8-ac-6f-ff-fe-b4-17-20 Capabilities: [110 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+ AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [150 v1] Power Budgeting <?> Capabilities: [160 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Kernel driver in use: bnx2 Kernel modules: bnx2 (XEN) [2014-02-05 20:15:20] MSI information: (XEN) [2014-02-05 20:15:20] IOMMU 56 vec=28 fixed edge assert phys cpu dest=00000020 mask=1/0/? (XEN) [2014-02-05 20:15:20] MSI 57 vec=d0 fixed edge assert phys cpu dest=00000020 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI 58 vec=d8 fixed edge assert phys cpu dest=00000020 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI 59 vec=21 fixed edge assert phys cpu dest=00000020 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI 60 vec=29 fixed edge assert phys cpu dest=00000020 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI 61 vec=31 fixed edge assert phys cpu dest=00000020 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI 62 vec=39 fixed edge assert phys cpu dest=00000020 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 63 vec=62 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 64 vec=d7 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 65 vec=ba fixed edge assert phys cpu dest=00000000 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 66 vec=92 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 67 vec=3a fixed edge assert phys cpu dest=00000021 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 68 vec=b8 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 69 vec=2a fixed edge assert phys cpu dest=00000020 mask=1/1/1 (XEN) [2014-02-05 20:15:20] MSI-X 70 vec=33 fixed edge assert phys cpu dest=00000021 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 71 vec=c2 fixed edge assert phys cpu dest=00000000 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 72 vec=9a fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 73 vec=d0 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 74 vec=da fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 75 vec=b2 fixed edge assert phys cpu dest=00000020 mask=1/1/1 (XEN) [2014-02-05 20:15:20] MSI-X 76 vec=6b fixed edge assert phys cpu dest=00000021 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 77 vec=68 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 78 vec=b2 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 79 vec=53 fixed edge assert phys cpu dest=00000021 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 80 vec=78 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 81 vec=4b fixed edge assert phys cpu dest=00000020 mask=1/1/1 (XEN) [2014-02-05 20:15:20] MSI-X 82 vec=a7 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 83 vec=63 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 84 vec=6f fixed edge assert phys cpu dest=00000000 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 85 vec=5b fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 86 vec=99 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 87 vec=a3 fixed edge assert phys cpu dest=00000020 mask=1/1/1 (XEN) [2014-02-05 20:15:20] MSI-X 88 vec=73 fixed edge assert phys cpu dest=00000000 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 89 vec=58 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 90 vec=aa fixed edge assert phys cpu dest=00000000 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 91 vec=38 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 92 vec=8f fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 93 vec=3c fixed edge assert phys cpu dest=00000020 mask=1/1/1 (XEN) [2014-02-05 20:15:20] MSI-X 94 vec=3b fixed edge assert phys cpu dest=00000021 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 95 vec=ca fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 96 vec=a8 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 97 vec=32 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 98 vec=23 fixed edge assert phys cpu dest=00000000 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 99 vec=94 fixed edge assert phys cpu dest=00000020 mask=1/1/1 (XEN) [2014-02-05 20:15:20] MSI-X 100 vec=7b fixed edge assert phys cpu dest=00000021 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 101 vec=60 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 102 vec=a2 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 103 vec=4b fixed edge assert phys cpu dest=00000021 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 104 vec=a1 fixed edge assert phys cpu dest=00000021 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 105 vec=2d fixed edge assert phys cpu dest=00000020 mask=1/1/1 (XEN) [2014-02-05 20:15:20] MSI-X 106 vec=43 fixed edge assert phys cpu dest=00000000 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 107 vec=d2 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 108 vec=61 fixed edge assert phys cpu dest=00000000 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 109 vec=21 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 110 vec=2b fixed edge assert phys cpu dest=00000000 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 111 vec=85 fixed edge assert phys cpu dest=00000020 mask=1/1/1 (XEN) [2014-02-05 20:15:22] Guest interrupt information: (XEN) [2014-02-05 20:15:22] IRQ: 0 affinity:00000001 vec:f0 type=IO-APIC-edge status=00000000 timer_interrupt+0/0x18f (XEN) [2014-02-05 20:15:22] IRQ: 1 affinity:00000001 vec:30 type=IO-APIC-edge status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 3 affinity:00000001 vec:38 type=IO-APIC-edge status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 4 affinity:00000001 vec:f1 type=IO-APIC-edge status=00000000 ns16550_interrupt+0/0x73 (XEN) [2014-02-05 20:15:22] IRQ: 5 affinity:00000001 vec:40 type=IO-APIC-edge status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 6 affinity:00000001 vec:48 type=IO-APIC-edge status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 7 affinity:00000001 vec:50 type=IO-APIC-edge status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 8 affinity:00000001 vec:58 type=IO-APIC-edge status=00000030 in-flight=0 domain-list=0: 8(---), (XEN) [2014-02-05 20:15:22] IRQ: 9 affinity:00000001 vec:60 type=IO-APIC-level status=00000030 in-flight=0 domain-list=0: 9(---), (XEN) [2014-02-05 20:15:22] IRQ: 10 affinity:00000001 vec:68 type=IO-APIC-edge status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 11 affinity:00000001 vec:70 type=IO-APIC-edge status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 12 affinity:00000001 vec:78 type=IO-APIC-edge status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 13 affinity:0000ffff vec:88 type=IO-APIC-edge status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 14 affinity:00000001 vec:90 type=IO-APIC-edge status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 15 affinity:00000001 vec:98 type=IO-APIC-edge status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 17 affinity:00000010 vec:59 type=IO-APIC-level status=00000030 in-flight=0 domain-list=0: 17(---), (XEN) [2014-02-05 20:15:22] IRQ: 18 affinity:00000010 vec:61 type=IO-APIC-level status=00000030 in-flight=0 domain-list=0: 18(---), (XEN) [2014-02-05 20:15:22] IRQ: 19 affinity:00000010 vec:b9 type=IO-APIC-level status=00000030 in-flight=0 domain-list=0: 19(---), (XEN) [2014-02-05 20:15:22] IRQ: 20 affinity:00000010 vec:71 type=IO-APIC-level status=00000030 in-flight=0 domain-list=0: 20(---), (XEN) [2014-02-05 20:15:22] IRQ: 21 affinity:00000400 vec:6d type=IO-APIC-level status=00000030 in-flight=0 domain-list=0: 21(---), (XEN) [2014-02-05 20:15:22] IRQ: 32 affinity:0000ffff vec:79 type=IO-APIC-level status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 33 affinity:0000ffff vec:89 type=IO-APIC-level status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 36 affinity:0000ffff vec:41 type=IO-APIC-level status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 38 affinity:0000ffff vec:99 type=IO-APIC-level status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 39 affinity:0000ffff vec:a9 type=IO-APIC-level status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 42 affinity:0000ffff vec:81 type=IO-APIC-level status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 43 affinity:0000ffff vec:91 type=IO-APIC-level status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 45 affinity:0000ffff vec:a1 type=IO-APIC-level status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 47 affinity:0000ffff vec:b1 type=IO-APIC-level status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 53 affinity:0000ffff vec:a0 type=IO-APIC-level status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 56 affinity:0000ffff vec:28 type=DMA_MSI status=00000000 iommu_page_fault+0/0x12 (XEN) [2014-02-05 20:15:22] IRQ: 57 affinity:00000001 vec:d0 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:311(---), (XEN) [2014-02-05 20:15:22] IRQ: 58 affinity:00000001 vec:d8 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:310(---), (XEN) [2014-02-05 20:15:22] IRQ: 59 affinity:00000001 vec:21 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:309(---), (XEN) [2014-02-05 20:15:22] IRQ: 60 affinity:00000001 vec:29 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:308(---), (XEN) [2014-02-05 20:15:22] IRQ: 61 affinity:00000001 vec:31 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:307(---), (XEN) [2014-02-05 20:15:22] IRQ: 62 affinity:00000001 vec:39 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:306(---), (XEN) [2014-02-05 20:15:22] IRQ: 63 affinity:00000004 vec:62 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:305(---), (XEN) [2014-02-05 20:15:22] IRQ: 64 affinity:00000004 vec:d7 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:304(---), (XEN) [2014-02-05 20:15:22] IRQ: 65 affinity:00000100 vec:ba type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:303(---), (XEN) [2014-02-05 20:15:22] IRQ: 66 affinity:00000004 vec:92 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:302(---), (XEN) [2014-02-05 20:15:22] IRQ: 67 affinity:00000002 vec:3a type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:301(---), (XEN) [2014-02-05 20:15:22] IRQ: 68 affinity:00000004 vec:b8 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:300(---), (XEN) [2014-02-05 20:15:22] IRQ: 69 affinity:00000001 vec:2a type=PCI-MSI/-X status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 70 affinity:00000002 vec:33 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:298(---), (XEN) [2014-02-05 20:15:22] IRQ: 71 affinity:00000100 vec:c2 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:297(---), (XEN) [2014-02-05 20:15:22] IRQ: 72 affinity:00000004 vec:9a type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:296(---), (XEN) [2014-02-05 20:15:22] IRQ: 73 affinity:00000004 vec:d0 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:295(---), (XEN) [2014-02-05 20:15:22] IRQ: 74 affinity:00000004 vec:da type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:294(---), (XEN) [2014-02-05 20:15:22] IRQ: 75 affinity:00000001 vec:b2 type=PCI-MSI/-X status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 76 affinity:00000002 vec:6b type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:292(---), (XEN) [2014-02-05 20:15:22] IRQ: 77 affinity:00000004 vec:68 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:291(---), (XEN) [2014-02-05 20:15:22] IRQ: 78 affinity:00000004 vec:b2 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:290(---), (XEN) [2014-02-05 20:15:22] IRQ: 79 affinity:00000002 vec:53 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:289(---), (XEN) [2014-02-05 20:15:22] IRQ: 80 affinity:00000004 vec:78 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:288(---), (XEN) [2014-02-05 20:15:22] IRQ: 81 affinity:00000001 vec:4b type=PCI-MSI/-X status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 82 affinity:00000004 vec:a7 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:286(---), (XEN) [2014-02-05 20:15:22] IRQ: 83 affinity:00000004 vec:63 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:285(---), (XEN) [2014-02-05 20:15:22] IRQ: 84 affinity:00000100 vec:6f type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:284(---), (XEN) [2014-02-05 20:15:22] IRQ: 85 affinity:00000004 vec:5b type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:283(---), (XEN) [2014-02-05 20:15:22] IRQ: 86 affinity:00000004 vec:99 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:282(---), (XEN) [2014-02-05 20:15:22] IRQ: 87 affinity:00000001 vec:a3 type=PCI-MSI/-X status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 88 affinity:00000100 vec:73 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:280(---), (XEN) [2014-02-05 20:15:22] IRQ: 89 affinity:00000004 vec:58 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:279(---), (XEN) [2014-02-05 20:15:22] IRQ: 90 affinity:00000100 vec:aa type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:278(---), (XEN) [2014-02-05 20:15:22] IRQ: 91 affinity:00000004 vec:38 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:277(---), (XEN) [2014-02-05 20:15:22] IRQ: 92 affinity:00000004 vec:8f type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:276(---), (XEN) [2014-02-05 20:15:22] IRQ: 93 affinity:00000001 vec:3c type=PCI-MSI/-X status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 94 affinity:00000002 vec:3b type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:274(---), (XEN) [2014-02-05 20:15:22] IRQ: 95 affinity:00000004 vec:ca type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:273(---), (XEN) [2014-02-05 20:15:22] IRQ: 96 affinity:00000004 vec:a8 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:272(---), (XEN) [2014-02-05 20:15:22] IRQ: 97 affinity:00000004 vec:32 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:271(---), (XEN) [2014-02-05 20:15:22] IRQ: 98 affinity:00000100 vec:23 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:270(---), (XEN) [2014-02-05 20:15:22] IRQ: 99 affinity:00000001 vec:94 type=PCI-MSI/-X status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 100 affinity:00000002 vec:7b type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:268(---), (XEN) [2014-02-05 20:15:22] IRQ: 101 affinity:00000004 vec:60 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:267(---), (XEN) [2014-02-05 20:15:22] IRQ: 102 affinity:00000004 vec:a2 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:266(---), (XEN) [2014-02-05 20:15:22] IRQ: 103 affinity:00000002 vec:4b type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:265(---), (XEN) [2014-02-05 20:15:22] IRQ: 104 affinity:00000002 vec:a1 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:264(---), (XEN) [2014-02-05 20:15:22] IRQ: 105 affinity:00000001 vec:2d type=PCI-MSI/-X status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IRQ: 106 affinity:00000100 vec:43 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:262(---), (XEN) [2014-02-05 20:15:22] IRQ: 107 affinity:00000004 vec:d2 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:261(---), (XEN) [2014-02-05 20:15:22] IRQ: 108 affinity:00000100 vec:61 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:260(---), (XEN) [2014-02-05 20:15:22] IRQ: 109 affinity:00000004 vec:21 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:259(---), (XEN) [2014-02-05 20:15:22] IRQ: 110 affinity:00000100 vec:2b type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:258(---), (XEN) [2014-02-05 20:15:22] IRQ: 111 affinity:00000001 vec:85 type=PCI-MSI/-X status=00000002 mapped, unbound (XEN) [2014-02-05 20:15:22] IO-APIC interrupt information: (XEN) [2014-02-05 20:15:22] IRQ 0 Vec240: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 2: vec=f0 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 1 Vec 48: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 1: vec=30 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 3 Vec 56: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 3: vec=38 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 4 Vec241: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 4: vec=f1 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 5 Vec 64: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 5: vec=40 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 6 Vec 72: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 6: vec=48 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 7 Vec 80: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 7: vec=50 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 8 Vec 88: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 8: vec=58 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 9 Vec 96: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 9: vec=60 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=L mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 10 Vec104: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 10: vec=68 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 11 Vec112: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 11: vec=70 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 12 Vec120: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 12: vec=78 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 13 Vec136: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 13: vec=88 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=1 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 14 Vec144: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 14: vec=90 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 15 Vec152: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 15: vec=98 delivery=Fixed dest=P status=0 polarity=0 irr=0 trig=E mask=0 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 17 Vec 89: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 17: vec=59 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=0 dest_id:50 (XEN) [2014-02-05 20:15:22] IRQ 18 Vec 97: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 18: vec=61 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=0 dest_id:50 (XEN) [2014-02-05 20:15:22] IRQ 19 Vec185: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 19: vec=b9 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=0 dest_id:50 (XEN) [2014-02-05 20:15:22] IRQ 20 Vec113: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 20: vec=71 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=0 dest_id:50 (XEN) [2014-02-05 20:15:22] IRQ 21 Vec109: (XEN) [2014-02-05 20:15:22] Apic 0x00, Pin 21: vec=6d delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=0 dest_id:2 (XEN) [2014-02-05 20:15:22] IRQ 32 Vec121: (XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 0: vec=79 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 33 Vec137: (XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 1: vec=89 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 36 Vec 65: (XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 4: vec=41 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 38 Vec153: (XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 6: vec=99 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 39 Vec169: (XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 7: vec=a9 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 42 Vec129: (XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 10: vec=81 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 43 Vec145: (XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 11: vec=91 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 45 Vec161: (XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 13: vec=a1 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 47 Vec177: (XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 15: vec=b1 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32 (XEN) [2014-02-05 20:15:22] IRQ 53 Vec160: (XEN) [2014-02-05 20:15:22] Apic 0x01, Pin 21: vec=a0 delivery=Fixed dest=P status=0 polarity=1 irr=0 trig=L mask=1 dest_id:32 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer 2014-02-05 20:23 ` Zoltan Kiss @ 2014-02-05 20:27 ` Zoltan Kiss 2014-02-05 20:43 ` Andrew Cooper 1 sibling, 0 replies; 11+ messages in thread From: Zoltan Kiss @ 2014-02-05 20:27 UTC (permalink / raw) To: Michael Chan Cc: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny, Don Skidmore, Greg Rose, Peter P Waskiewicz Jr, Alex Duyck, John Ronciak, Tushar Dave, Akeem G Abodunrin, David S. Miller, e1000-devel, netdev, linux-kernel, xen-devel, Andrew Cooper On 05/02/14 20:23, Zoltan Kiss wrote: > On 04/02/14 19:47, Michael Chan wrote: >> On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote: >>> [ 5417.275472] WARNING: at net/sched/sch_generic.c:255 >>> dev_watchdog+0x156/0x1f0() >>> [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out >> >> The dump shows an internal IRQ pending on MSIX vector 2 which matches >> the the queue number that is timing out. I don't know what happened to >> the MSIX and why the driver is not seeing it. Do you see an IRQ error >> message from the kernel a few seconds before the tx timeout message? > > I haven't seen any IRQ related error message. Note, this is on Xen > 4.3.1. Now I have new results with a reworked version of the patch, > unfortunately it still has this issue. Here is a bnx2 dump, lspci output > and some Xen debug output (MSI and interrupt bindings, I have more if > needed). And here is the watchdog message and the first dump, if it matters: [10118.282007] ------------[ cut here ]------------ [10118.282018] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x156/0x1f0() [10118.282021] NETDEV WATCHDOG: eth0 (bnx2): transmit queue 0 timed out [10118.282024] Modules linked in: tun nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss oid_registry nfsv4 nfs fscache lockd sunrpc ipv6 openvswitch(O) frag_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables sr_mod cdrom nls_utf8 isofs dm_multipath scsi_dh dm_mod usb_storage lk_helper cryptd lrw aes_i586 xts gf128mul coretemp microcode hid_generic lpc_ich mfd_core ehci_pci ehci_hcd i7core_edac edac_core bnx2 sg hed u scsi_transport_sas raid_class scsi_mod [10118.282083] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G O 3.10.11-0.xs1.8.50.175.377583 #1 [10118.282086] Hardware name: Dell Inc. PowerEdge M710HD/05GGXD, BIOS 2.0.0 01/31/2011 [10118.282089] 000000ff ee0a5dd0 c1488cd3 ee0a5df8 c1046664 c1658a88 ee0a5e24 000000ff [10118.282097] c13fc1c6 c13fc1c6 ec778000 00000000 00256a1c ee0a5e10 c1046723 00000009 [10118.282104] ee0a5e08 c1658a88 ee0a5e24 ee0a5e48 c13fc1c6 c16556e1 000000ff c1658a88 [10118.282112] Call Trace: [10118.282118] [<c1488cd3>] dump_stack+0x16/0x1b [10118.282125] [<c1046664>] warn_slowpath_common+0x64/0x80 [10118.282129] [<c13fc1c6>] ? dev_watchdog+0x156/0x1f0 [10118.282133] [<c13fc1c6>] ? dev_watchdog+0x156/0x1f0 [10118.282137] [<c1046723>] warn_slowpath_fmt+0x33/0x40 [10118.282141] [<c13fc1c6>] dev_watchdog+0x156/0x1f0 [10118.282149] [<c10549ce>] call_timer_fn+0x3e/0xf0 [10118.282155] [<c10013a7>] ? xen_hypercall_sched_op+0x7/0x20 [10118.282159] [<c13fc070>] ? __netdev_watchdog_up+0x60/0x60 [10118.282164] [<c1055c1b>] run_timer_softirq+0x1ab/0x210 [10118.282169] [<c10be4fd>] ? irq_get_irq_data+0xd/0x10 [10118.282176] [<c130fb2d>] ? info_for_irq+0xd/0x20 [10118.282180] [<c13fc070>] ? __netdev_watchdog_up+0x60/0x60 [10118.282184] [<c104e3f4>] __do_softirq+0xc4/0x200 [10118.282189] [<c1312316>] ? evtchn_fifo_handle_events+0xf6/0x120 [10118.282193] [<c104e5bd>] irq_exit+0x3d/0x90 [10118.282198] [<c130fe55>] xen_evtchn_do_upcall+0x25/0x40 [10118.282203] [<c14935c7>] xen_do_upcall+0x7/0xc [10118.282207] [<c10013a7>] ? xen_hypercall_sched_op+0x7/0x20 [10118.282213] [<c1007f12>] ? xen_safe_halt+0x12/0x20 [10118.282218] [<c1015eff>] default_idle+0x3f/0xb0 [10118.282222] [<c1015a17>] arch_cpu_idle+0x17/0x30 [10118.282229] [<c108f591>] cpu_startup_entry+0x141/0x1f0 [10118.282234] [<c147d11b>] cpu_bringup_and_idle+0x12/0x14 [10118.282237] ---[ end trace 25ed24391f6c7acd ]--- [10118.282242] bnx2 0000:02:00.0 eth0: <--- start FTQ dump ---> [10118.282267] bnx2 0000:02:00.0 eth0: RV2P_PFTQ_CTL 00010000 [10118.282277] bnx2 0000:02:00.0 eth0: RV2P_TFTQ_CTL 00020000 [10118.282288] bnx2 0000:02:00.0 eth0: RV2P_MFTQ_CTL 00004000 [10118.282298] bnx2 0000:02:00.0 eth0: TBDR_FTQ_CTL 00004002 [10118.282309] bnx2 0000:02:00.0 eth0: TDMA_FTQ_CTL 00010002 [10118.282319] bnx2 0000:02:00.0 eth0: TXP_FTQ_CTL 01810002 [10118.282330] bnx2 0000:02:00.0 eth0: TXP_FTQ_CTL 01810002 [10118.282340] bnx2 0000:02:00.0 eth0: TPAT_FTQ_CTL 00010002 [10118.282372] bnx2 0000:02:00.0 eth0: RXP_CFTQ_CTL 00008000 [10118.282383] bnx2 0000:02:00.0 eth0: RXP_FTQ_CTL 00100000 [10118.282392] bnx2 0000:02:00.0 eth0: COM_COMXQ_FTQ_CTL 00010000 [10118.282403] bnx2 0000:02:00.0 eth0: COM_COMTQ_FTQ_CTL 00020000 [10118.282414] bnx2 0000:02:00.0 eth0: COM_COMQ_FTQ_CTL 00010000 [10118.282425] bnx2 0000:02:00.0 eth0: CP_CPQ_FTQ_CTL 00004000 [10118.282434] bnx2 0000:02:00.0 eth0: CPU states: [10118.282449] bnx2 0000:02:00.0 eth0: 045000 mode b84c state 80001000 evt_mask 500 pc 8000844 pc 80012bc instr a0e00012 [10118.282471] bnx2 0000:02:00.0 eth0: 085000 mode b84c state 80001000 evt_mask 500 pc 8000a50 pc 8000ac4 instr 38420001 [10118.282493] bnx2 0000:02:00.0 eth0: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c14 pc 8004c18 instr 32070001 [10118.282515] bnx2 0000:02:00.0 eth0: 105000 mode b8cc state 80000000 evt_mask 500 pc 8000a9c pc 8000b28 instr 8c530000 [10118.282537] bnx2 0000:02:00.0 eth0: 145000 mode b880 state 80000000 evt_mask 500 pc 800d1a8 pc 800af74 instr 441010a [10118.282560] bnx2 0000:02:00.0 eth0: 185000 mode b8cc state 80000000 evt_mask 500 pc 8000918 pc 8000928 instr 8f870048 [10118.282577] bnx2 0000:02:00.0 eth0: <--- end FTQ dump ---> [10118.282586] bnx2 0000:02:00.0 eth0: <--- start TBDC dump ---> [10118.282597] bnx2 0000:02:00.0 eth0: TBDC free cnt: 31 [10118.282607] bnx2 0000:02:00.0 eth0: LINE CID BIDX CMD VALIDS [10118.282622] bnx2 0000:02:00.0 eth0: 00 000800 a3b8 00 [1] [10118.282637] bnx2 0000:02:00.0 eth0: 01 001100 1b58 00 [0] [10118.282652] bnx2 0000:02:00.0 eth0: 02 000800 a390 00 [0] [10118.282667] bnx2 0000:02:00.0 eth0: 03 000800 a370 00 [0] [10118.282682] bnx2 0000:02:00.0 eth0: 04 000800 a378 00 [0] [10118.282696] bnx2 0000:02:00.0 eth0: 05 000800 a388 00 [0] [10118.282711] bnx2 0000:02:00.0 eth0: 06 000800 a398 00 [0] [10118.282726] bnx2 0000:02:00.0 eth0: 07 000800 a3a8 00 [0] [10118.282741] bnx2 0000:02:00.0 eth0: 08 000800 a3b0 00 [0] [10118.282756] bnx2 0000:02:00.0 eth0: 09 000800 a3b8 00 [0] [10118.282771] bnx2 0000:02:00.0 eth0: 0a 000800 8c10 00 [0] [10118.282785] bnx2 0000:02:00.0 eth0: 0b 000800 eaf0 00 [0] [10118.282800] bnx2 0000:02:00.0 eth0: 0c 000800 eaf8 00 [0] [10118.282815] bnx2 0000:02:00.0 eth0: 0d 001100 5e60 00 [0] [10118.282830] bnx2 0000:02:00.0 eth0: 0e 001100 5e68 00 [0] [10118.282845] bnx2 0000:02:00.0 eth0: 0f 001100 5e70 00 [0] [10118.282860] bnx2 0000:02:00.0 eth0: 10 001100 5e88 00 [0] [10118.282875] bnx2 0000:02:00.0 eth0: 11 001100 5e90 00 [0] [10118.282890] bnx2 0000:02:00.0 eth0: 12 001100 5ee8 00 [0] [10118.282905] bnx2 0000:02:00.0 eth0: 13 001100 5ef8 00 [0] [10118.282920] bnx2 0000:02:00.0 eth0: 14 001100 5e00 00 [0] [10118.282935] bnx2 0000:02:00.0 eth0: 15 001100 5a20 00 [0] [10118.282950] bnx2 0000:02:00.0 eth0: 16 001100 59a8 00 [0] [10118.282964] bnx2 0000:02:00.0 eth0: 17 001100 59b0 00 [0] [10118.282979] bnx2 0000:02:00.0 eth0: 18 001100 59b8 00 [0] [10118.282994] bnx2 0000:02:00.0 eth0: 19 001100 5a28 00 [0] [10118.283009] bnx2 0000:02:00.0 eth0: 1a 001100 5a30 00 [0] [10118.283024] bnx2 0000:02:00.0 eth0: 1b 000800 8c58 00 [0] [10118.283038] bnx2 0000:02:00.0 eth0: 1c 000800 8c60 00 [0] [10118.283053] bnx2 0000:02:00.0 eth0: 1d 055e80 dca8 fb [0] [10118.283068] bnx2 0000:02:00.0 eth0: 1e 1cf780 f7b8 af [0] [10118.283083] bnx2 0000:02:00.0 eth0: 1f 1dff80 efe0 bf [0] [10118.283094] bnx2 0000:02:00.0 eth0: <--- end TBDC dump ---> [10118.283111] bnx2 0000:02:00.0 eth0: DEBUG: intr_sem[0] PCI_CMD[00100406] [10118.283128] bnx2 0000:02:00.0 eth0: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088] [10118.283143] bnx2 0000:02:00.0 eth0: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000] [10118.283159] bnx2 0000:02:00.0 eth0: DEBUG: RPM_MGMT_PKT_CTRL[40000088] [10118.283170] bnx2 0000:02:00.0 eth0: DEBUG: HC_STATS_INTERRUPT_STATUS[01fe0001] [10118.283184] bnx2 0000:02:00.0 eth0: DEBUG: PBA[00000000] [10118.283194] bnx2 0000:02:00.0 eth0: <--- start MCP states dump ---> [10118.283207] bnx2 0000:02:00.0 eth0: DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e] [10118.283224] bnx2 0000:02:00.0 eth0: DEBUG: MCP mode[0000b880] state[80004000] evt_mask[00000500] [10118.283242] bnx2 0000:02:00.0 eth0: DEBUG: pc[0800d244] pc[0800b0ac] instr[00000000] [10118.283255] bnx2 0000:02:00.0 eth0: DEBUG: shmem states: [10118.283268] bnx2 0000:02:00.0 eth0: DEBUG: drv_mb[0d000005] fw_mb[00000005] link_status[0010026f] [10118.283283] drv_pulse_mb[00002768] [10118.283288] bnx2 0000:02:00.0 eth0: DEBUG: dev_info_signature[44564903] reset_type[01005254] [10118.283302] condition[0003610e] [10118.283310] bnx2 0000:02:00.0 eth0: DEBUG: 000001c0: 01005254 42530088 0003610e 00000000 [10118.283328] bnx2 0000:02:00.0 eth0: DEBUG: 000003cc: 44444444 44444444 44444444 00000a28 [10118.283346] bnx2 0000:02:00.0 eth0: DEBUG: 000003dc: 000cffff 00000000 ffff0000 00000000 [10118.283364] bnx2 0000:02:00.0 eth0: DEBUG: 000003ec: 00000000 00000000 00000000 00000000 [10118.283379] bnx2 0000:02:00.0 eth0: DEBUG: 0x3fc[0000ffff] [10118.283389] bnx2 0000:02:00.0 eth0: <--- end MCP states dump ---> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer 2014-02-05 20:23 ` Zoltan Kiss 2014-02-05 20:27 ` Zoltan Kiss @ 2014-02-05 20:43 ` Andrew Cooper 2014-02-06 9:58 ` Zoltan Kiss 1 sibling, 1 reply; 11+ messages in thread From: Andrew Cooper @ 2014-02-05 20:43 UTC (permalink / raw) To: Zoltan Kiss, Michael Chan Cc: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny, Don Skidmore, Greg Rose, Peter P Waskiewicz Jr, Alex Duyck, John Ronciak, Tushar Dave, Akeem G Abodunrin, David S. Miller, e1000-devel, netdev, linux-kernel, xen-devel On 05/02/2014 20:23, Zoltan Kiss wrote: > On 04/02/14 19:47, Michael Chan wrote: >> On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote: >>> [ 5417.275472] WARNING: at net/sched/sch_generic.c:255 >>> dev_watchdog+0x156/0x1f0() >>> [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out >> >> The dump shows an internal IRQ pending on MSIX vector 2 which matches >> the the queue number that is timing out. I don't know what happened to >> the MSIX and why the driver is not seeing it. Do you see an IRQ error >> message from the kernel a few seconds before the tx timeout message? > > I haven't seen any IRQ related error message. Note, this is on Xen > 4.3.1. Now I have new results with a reworked version of the patch, > unfortunately it still has this issue. Here is a bnx2 dump, lspci > output and some Xen debug output (MSI and interrupt bindings, I have > more if needed). You need debug-keys 'Q' as well to map between the PCI devices and Xen IRQs ~Andrew ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer 2014-02-05 20:43 ` Andrew Cooper @ 2014-02-06 9:58 ` Zoltan Kiss 0 siblings, 0 replies; 11+ messages in thread From: Zoltan Kiss @ 2014-02-06 9:58 UTC (permalink / raw) To: Andrew Cooper, Michael Chan Cc: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny, Don Skidmore, Greg Rose, Peter P Waskiewicz Jr, Alex Duyck, John Ronciak, Tushar Dave, Akeem G Abodunrin, David S. Miller, e1000-devel, netdev, linux-kernel, xen-devel On 05/02/14 20:43, Andrew Cooper wrote: > On 05/02/2014 20:23, Zoltan Kiss wrote: >> On 04/02/14 19:47, Michael Chan wrote: >>> On Fri, 2014-01-31 at 14:29 +0100, Zoltan Kiss wrote: >>>> [ 5417.275472] WARNING: at net/sched/sch_generic.c:255 >>>> dev_watchdog+0x156/0x1f0() >>>> [ 5417.275474] NETDEV WATCHDOG: eth1 (bnx2): transmit queue 2 timed out >>> >>> The dump shows an internal IRQ pending on MSIX vector 2 which matches >>> the the queue number that is timing out. I don't know what happened to >>> the MSIX and why the driver is not seeing it. Do you see an IRQ error >>> message from the kernel a few seconds before the tx timeout message? >> >> I haven't seen any IRQ related error message. Note, this is on Xen >> 4.3.1. Now I have new results with a reworked version of the patch, >> unfortunately it still has this issue. Here is a bnx2 dump, lspci >> output and some Xen debug output (MSI and interrupt bindings, I have >> more if needed). > > You need debug-keys 'Q' as well to map between the PCI devices and Xen IRQs > > ~Andrew > I could have it after reboot: (XEN) [2014-02-06 09:44:34] 0000:02:00.0 - dom 0 - MSIs < 64 65 66 67 68 69 > So the relevant MSI informations: (XEN) [2014-02-05 20:15:20] MSI-X 64 vec=d7 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 65 vec=ba fixed edge assert phys cpu dest=00000000 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 66 vec=92 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 67 vec=3a fixed edge assert phys cpu dest=00000021 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 68 vec=b8 fixed edge assert phys cpu dest=00000022 mask=1/0/0 (XEN) [2014-02-05 20:15:20] MSI-X 69 vec=2a fixed edge assert phys cpu dest=00000020 mask=1/1/1 ... (XEN) [2014-02-05 20:15:22] IRQ: 64 affinity:00000004 vec:d7 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:304(---), (XEN) [2014-02-05 20:15:22] IRQ: 65 affinity:00000100 vec:ba type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:303(---), (XEN) [2014-02-05 20:15:22] IRQ: 66 affinity:00000004 vec:92 type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:302(---), (XEN) [2014-02-05 20:15:22] IRQ: 67 affinity:00000002 vec:3a type=PCI-MSI/-X status=00000010 in-flight=0 domain-list=0:301(---), (XEN) [2014-02-05 20:15:22] IRQ: 68 affinity:00000004 vec:b8 type=PCI-MSI/-X status=00000030 in-flight=0 domain-list=0:300(---), (XEN) [2014-02-05 20:15:22] IRQ: 69 affinity:00000001 vec:2a type=PCI-MSI/-X status=00000002 mapped, unbound Zoli ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer 2014-01-30 19:08 igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer Zoltan Kiss 2014-01-30 20:34 ` Michael Chan @ 2014-01-31 18:56 ` Wei Liu 2014-02-04 21:32 ` Zoltan Kiss 2014-02-12 17:13 ` Zoltan Kiss 2 siblings, 1 reply; 11+ messages in thread From: Wei Liu @ 2014-01-31 18:56 UTC (permalink / raw) To: Zoltan Kiss Cc: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny, Don Skidmore, Greg Rose, Peter P Waskiewicz Jr, Alex Duyck, John Ronciak, Tushar Dave, Akeem G Abodunrin, David S. Miller, e1000-devel, netdev, linux-kernel, Michael Chan, xen-devel, wei.liu2 On Thu, Jan 30, 2014 at 07:08:11PM +0000, Zoltan Kiss wrote: > Hi, > > I've experienced some queue timeout problems mentioned in the > subject with igb and bnx2 cards. I haven't seen them on other cards > so far. I'm using XenServer with 3.10 Dom0 kernel (however igb were > already updated to latest version), and there are Windows guests > sending data through these cards. I noticed these problems in XenRT > test runs, and I know that they usually mean some lost interrupt > problem or other hardware error, but in my case they started to > appear more often, and they are likely connected to my netback grant > mapping patches. These patches causing skb's with huge (~64kb) > linear buffers to appear more often. > The reason for that is an old problem in the ring protocol: > originally the maximum amount of slots were linked to MAX_SKB_FRAGS, > as every slot ended up as a frag of the skb. When this value were > changed, netback had to cope with the situation by coalescing the > packets into fewer frags. > My patch series take a different approach: the leftover slots > (pages) were assigned to a new skb's frags, and that skb were > stashed to the frag_list of the first one. Then, before sending it > off to the stack it calls skb = skb_copy_expand(skb, 0, 0, > GFP_ATOMIC, __GFP_NOWARN), which basically creates a new skb and > copied all the data into it. As far as I understood, it put > everything into the linear buffer, which can amount to 64KB at most. > The original skb are freed then, and this new one were sent to the > stack. Just my two cents, if it is this case, you can try to call skb_copy_expand on every SKB netback receives to manually create SKBs with ~64KB linear buffer to see how it goes... Wei. > I suspect that this is the problem as it only happens when guests > send too much slots. Does anyone familiar with these drivers have > seen such issue before? (when these kind of skb's get stucked in the > queue) > > Regards, > > Zoltan Kiss > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer 2014-01-31 18:56 ` Wei Liu @ 2014-02-04 21:32 ` Zoltan Kiss 0 siblings, 0 replies; 11+ messages in thread From: Zoltan Kiss @ 2014-02-04 21:32 UTC (permalink / raw) To: Wei Liu, Zoltan Kiss Cc: Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny, Don Skidmore, Greg Rose, Peter P Waskiewicz Jr, Alex Duyck, John Ronciak, Tushar Dave, Akeem G Abodunrin, David S. Miller, e1000-devel, netdev, linux-kernel, Michael Chan, xen-devel On 31/01/14 18:56, Wei Liu wrote: > On Thu, Jan 30, 2014 at 07:08:11PM +0000, Zoltan Kiss wrote: >> Hi, >> >> I've experienced some queue timeout problems mentioned in the >> subject with igb and bnx2 cards. I haven't seen them on other cards >> so far. I'm using XenServer with 3.10 Dom0 kernel (however igb were >> already updated to latest version), and there are Windows guests >> sending data through these cards. I noticed these problems in XenRT >> test runs, and I know that they usually mean some lost interrupt >> problem or other hardware error, but in my case they started to >> appear more often, and they are likely connected to my netback grant >> mapping patches. These patches causing skb's with huge (~64kb) >> linear buffers to appear more often. >> The reason for that is an old problem in the ring protocol: >> originally the maximum amount of slots were linked to MAX_SKB_FRAGS, >> as every slot ended up as a frag of the skb. When this value were >> changed, netback had to cope with the situation by coalescing the >> packets into fewer frags. >> My patch series take a different approach: the leftover slots >> (pages) were assigned to a new skb's frags, and that skb were >> stashed to the frag_list of the first one. Then, before sending it >> off to the stack it calls skb = skb_copy_expand(skb, 0, 0, >> GFP_ATOMIC, __GFP_NOWARN), which basically creates a new skb and >> copied all the data into it. As far as I understood, it put >> everything into the linear buffer, which can amount to 64KB at most. >> The original skb are freed then, and this new one were sent to the >> stack. > > Just my two cents, if it is this case, you can try to call > skb_copy_expand on every SKB netback receives to manually create SKBs > with ~64KB linear buffer to see how it goes... I've tried it, and it did break everything in a similar way, so that's a strong clue that the problem lies here. I've rewrote that part of my patches to do less modification, based on Malcolm's idea: netback pulls the first frag into linear buffer, then moves a frag from the frag_list skb into the first one. That seems to help, but so far I have only one relevant test result, I'm waiting for more results. Zoli ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer 2014-01-30 19:08 igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer Zoltan Kiss 2014-01-30 20:34 ` Michael Chan 2014-01-31 18:56 ` Wei Liu @ 2014-02-12 17:13 ` Zoltan Kiss 2 siblings, 0 replies; 11+ messages in thread From: Zoltan Kiss @ 2014-02-12 17:13 UTC (permalink / raw) To: Zoltan Kiss, Jeff Kirsher, Jesse Brandeburg, Bruce Allan, Carolyn Wyborny, Don Skidmore, Greg Rose, Peter P Waskiewicz Jr, Alex Duyck, John Ronciak, Tushar Dave, Akeem G Abodunrin, David S. Miller, e1000-devel, netdev, linux-kernel, Michael Chan, xen-devel Hi, I still haven't managed to crack this problem. I've made sure the below mentioned skb's look the same as the other ones: linear buffer with header, and the rest is aggregated into frags. Utilizing the skb destructor I've also checked that these packets are all freed before the TX hang happens. So the only difference from current upstream is that the pages are grant mapped into Dom0 instead of grant copy to a local page. I've also found some of my older notes about this issue, where I managed to reproduce this on igb, and in that particular case the TX hang could be solved with ifconfig down/up. Does the "Detected Tx Unit Hang" messages give any hint to igb developers? Nov 26 04:18:34 localhost kernel: [ 7814.197868] ------------[ cut here ]------------ Nov 26 04:18:34 localhost kernel: [ 7814.197889] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x165/0x220() Nov 26 04:18:34 localhost kernel: [ 7814.197892] NETDEV WATCHDOG: eth0 (igb): transmit queue 7 timed out Nov 26 04:18:34 localhost kernel: [ 7814.197894] Modules linked in: tun nfsv3 nfs_acl nfs fscache dm_multipath scsi_dh lockd sunrpc openvswitch ipt_REJECT nf_conntrack_ipv4 nf_defrag_ip v4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter ip_tables x_tables nls_utf8 isofs dm_mirror video backlight sbs sbshc hed acpi_ipmi ipmi_msghandler nvram sg psmouse serio_raw igb i2c_algo_bit ptp pps_core hpilo tpm_tis tpm tpm_bios lpc_ich mfd_core ehci_pci crc32_pclmul aesni_intel ablk_helper cryptd lrw aes_i586 xts gf128mul dm_region_hash dm_log dm_mod shpchp hpsa sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd fbcon font tileblit bitblit softcursor [last unloaded: microcode] Nov 26 04:18:34 localhost kernel: [ 7814.197957] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 3.10.11-0.xs1.8.50.127.377543 #1 Nov 26 04:18:34 localhost kernel: [ 7814.197959] Hardware name: HP ProLiant BL420c Gen8, BIOS I30 12/14/2012 Nov 26 04:18:34 localhost kernel: [ 7814.197962] e5cd9e10 c13e4c55 e5cd9ddc c1278546 e5cd9e00 c1047fd3 c1643220 e5cd9e2c Nov 26 04:18:34 localhost kernel: [ 7814.197969] 000000ff c13e4c55 e1fa8700 00000007 000004e2 e5cd9e18 c1048093 00000009 Nov 26 04:18:34 localhost kernel: [ 7814.197975] e5cd9e10 c1643220 e5cd9e2c e5cd9e50 c13e4c55 c163fe6b 000000ff c1643220 Nov 26 04:18:34 localhost kernel: [ 7814.197982] Call Trace: Nov 26 04:18:34 localhost kernel: [ 7814.197988] [<c13e4c55>] ? dev_watchdog+0x165/0x220 Nov 26 04:18:34 localhost kernel: [ 7814.197994] [<c1278546>] dump_stack+0x16/0x20 Nov 26 04:18:34 localhost kernel: [ 7814.198000] [<c1047fd3>] warn_slowpath_common+0x63/0x80 Nov 26 04:18:34 localhost kernel: [ 7814.198003] [<c13e4c55>] ? dev_watchdog+0x165/0x220 Nov 26 04:18:34 localhost kernel: [ 7814.198007] [<c1048093>] warn_slowpath_fmt+0x33/0x40 Nov 26 04:18:34 localhost kernel: [ 7814.198011] [<c13e4c55>] dev_watchdog+0x165/0x220 Nov 26 04:18:34 localhost kernel: [ 7814.198017] [<c13e4af0>] ? dev_activate+0x110/0x110 Nov 26 04:18:34 localhost kernel: [ 7814.198020] [<c1055c18>] call_timer_fn+0x58/0xe0 Nov 26 04:18:34 localhost kernel: [ 7814.198024] [<c1056ce8>] run_timer_softirq+0x1a8/0x1f0 Nov 26 04:18:34 localhost kernel: [ 7814.198028] [<c12fb61d>] ? info_for_irq+0xd/0x20 Nov 26 04:18:34 localhost kernel: [ 7814.198031] [<c12fbb6c>] ? evtchn_from_irq+0x3c/0x50 Nov 26 04:18:34 localhost kernel: [ 7814.198034] [<c13e4af0>] ? dev_activate+0x110/0x110 Nov 26 04:18:34 localhost kernel: [ 7814.198038] [<c104fcb9>] __do_softirq+0xd9/0x1e0 Nov 26 04:18:34 localhost kernel: [ 7814.198041] [<c12fc045>] ? __xen_evtchn_do_upcall+0x245/0x280 Nov 26 04:18:34 localhost kernel: [ 7814.198045] [<c104fe41>] irq_exit+0x41/0x80 Nov 26 04:18:34 localhost kernel: [ 7814.198048] [<c12fc0e5>] xen_evtchn_do_upcall+0x25/0x30 Nov 26 04:18:34 localhost kernel: [ 7814.198053] [<c147b287>] xen_do_upcall+0x7/0xc Nov 26 04:18:34 localhost kernel: [ 7814.198058] [<c10c00d8>] ? rcu_process_gp_end+0x58/0x70 Nov 26 04:18:34 localhost kernel: [ 7814.198061] [<c10013a7>] ? xen_hypercall_sched_op+0x7/0x20 Nov 26 04:18:34 localhost kernel: [ 7814.198066] [<c1007ef2>] ? xen_safe_halt+0x12/0x20 Nov 26 04:18:34 localhost kernel: [ 7814.198070] [<c1015be6>] default_idle+0x56/0xb0 Nov 26 04:18:34 localhost kernel: [ 7814.198074] [<c10158e7>] arch_cpu_idle+0x17/0x30 Nov 26 04:18:34 localhost kernel: [ 7814.198078] [<c108e2ae>] cpu_startup_entry+0x15e/0x1d0 Nov 26 04:18:34 localhost kernel: [ 7814.198085] [<c1464282>] cpu_bringup_and_idle+0x12/0x20 Nov 26 04:18:34 localhost kernel: [ 7814.198088] ---[ end trace d8c0d3f5c187aa6b ]--- And the recovery: Nov 26 21:47:54 localhost kernel: [70773.950715] ------------[ cut here ]------------ Nov 26 21:47:54 localhost kernel: [70773.950747] WARNING: at net/core/dev.c:4201 net_rx_action+0xfd/0x1c0() Nov 26 21:47:54 localhost kernel: [70773.950751] Modules linked in: tun nfsv3 nfs_acl nfs fscache dm_multipath scsi_dh lockd sunrpc openvswitch ipt_REJECT nf_conntrack_ipv4 nf_defrag_ip v4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter ip_tables x_tables nls_utf8 isofs dm_mirror video backlight sbs sbshc hed acpi_ipmi ipmi_msghandler nvram sg psmouse serio_raw igb i2c_algo_bit ptp pps_core hpilo tpm_tis tpm tpm_bios lpc_ich mfd_core ehci_pci crc32_pclmul aesni_intel ablk_helper cryptd lrw aes_i586 xts gf128mul dm_region_hash dm_log dm_mod shpchp hpsa sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd fbcon font tileblit bitblit softcursor [last unloaded: microcode] Nov 26 21:47:54 localhost kernel: [70773.950852] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.10.11-0.xs1.8.50.127.377543 #1 Nov 26 21:47:54 localhost kernel: [70773.950856] Hardware name: HP ProLiant BL420c Gen8, BIOS I30 12/14/2012 Nov 26 21:47:54 localhost kernel: [70773.950860] 00000000 c13ccdfd c167fc78 c1278546 c167fc9c c1047fd3 c15ebc78 c163f7da Nov 26 21:47:54 localhost kernel: [70773.950873] 00001069 c13ccdfd dff404c8 00000040 00000000 c167fcac c1048012 00000009 Nov 26 21:47:54 localhost kernel: [70773.950884] 00000000 c167fcd8 c13ccdfd ed383888 010cbb97 000000e2 ed383880 00000043 Nov 26 21:47:54 localhost kernel: [70773.950896] Call Trace: Nov 26 21:47:54 localhost kernel: [70773.950905] [<c13ccdfd>] ? net_rx_action+0xfd/0x1c0 Nov 26 21:47:54 localhost kernel: [70773.950915] [<c1278546>] dump_stack+0x16/0x20 Nov 26 21:47:54 localhost kernel: [70773.950924] [<c1047fd3>] warn_slowpath_common+0x63/0x80 Nov 26 21:47:54 localhost kernel: [70773.950930] [<c13ccdfd>] ? net_rx_action+0xfd/0x1c0 Nov 26 21:47:54 localhost kernel: [70773.950937] [<c1048012>] warn_slowpath_null+0x22/0x30 Nov 26 21:47:54 localhost kernel: [70773.950954] [<c13ccdfd>] net_rx_action+0xfd/0x1c0 Nov 26 21:47:54 localhost kernel: [70773.950969] [<c104fcb9>] __do_softirq+0xd9/0x1e0 Nov 26 21:47:54 localhost kernel: [70773.950985] [<c12fc045>] ? __xen_evtchn_do_upcall+0x245/0x280 Nov 26 21:47:54 localhost kernel: [70773.951002] [<c104fe41>] irq_exit+0x41/0x80 Nov 26 21:47:54 localhost kernel: [70773.951011] [<c12fc0e5>] xen_evtchn_do_upcall+0x25/0x30 Nov 26 21:47:54 localhost kernel: [70773.951019] [<c147b287>] xen_do_upcall+0x7/0xc Nov 26 21:47:54 localhost kernel: [70773.951026] [<c10013a7>] ? xen_hypercall_sched_op+0x7/0x20 Nov 26 21:47:54 localhost kernel: [70773.951033] [<c1007ef2>] ? xen_safe_halt+0x12/0x20 Nov 26 21:47:54 localhost kernel: [70773.951041] [<c1015be6>] default_idle+0x56/0xb0 Nov 26 21:47:54 localhost kernel: [70773.951046] [<c10158e7>] arch_cpu_idle+0x17/0x30 Nov 26 21:47:54 localhost kernel: [70773.951054] [<c108e2ae>] cpu_startup_entry+0x15e/0x1d0 Nov 26 21:47:54 localhost kernel: [70773.951064] [<c1460362>] rest_init+0x62/0x70 Nov 26 21:47:54 localhost kernel: [70773.951071] [<c16efcea>] start_kernel+0x39a/0x3b0 Nov 26 21:47:54 localhost kernel: [70773.951076] [<c16ef520>] ? repair_env_string+0x60/0x60 Nov 26 21:47:54 localhost kernel: [70773.951082] [<c16ef2eb>] i386_start_kernel+0x8b/0x90 Nov 26 21:47:54 localhost kernel: [70773.951088] [<c16f2c2d>] xen_start_kernel+0x7cd/0x7f0 Nov 26 21:47:54 localhost kernel: [70773.951097] ---[ end trace d8c0d3f5c187aa6c ]--- Nov 26 21:47:54 localhost kernel: [70773.952034] ------------[ cut here ]------------ Nov 26 21:47:54 localhost kernel: [70773.952067] WARNING: at drivers/net/ethernet/intel/igb/igb_main.c:2860 __igb_close+0x3d/0xb0 [igb]() Nov 26 21:47:54 localhost kernel: [70773.952071] Modules linked in: tun nfsv3 nfs_acl nfs fscache dm_multipath scsi_dh lockd sunrpc openvswitch ipt_REJECT nf_conntrack_ipv4 nf_defrag_ip v4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter ip_tables x_tables nls_utf8 isofs dm_mirror video backlight sbs sbshc hed acpi_ipmi ipmi_msghandler nvram sg psmouse serio_raw igb i2c_algo_bit ptp pps_core hpilo tpm_tis tpm tpm_bios lpc_ich mfd_core ehci_pci crc32_pclmul aesni_intel ablk_helper cryptd lrw aes_i586 xts gf128mul dm_region_hash dm_log dm_mod shpchp hpsa sd_mod scsi_mod uhci_hcd ohci_hcd ehci_hcd fbcon font tileblit bitblit softcursor [last unloaded: microcode] Nov 26 21:47:54 localhost kernel: [70773.952150] CPU: 4 PID: 3467 Comm: ifconfig Tainted: G W 3.10.11-0.xs1.8.50.127.377543 #1 Nov 26 21:47:54 localhost kernel: [70773.952153] Hardware name: HP ProLiant BL420c Gen8, BIOS I30 12/14/2012 Nov 26 21:47:54 localhost kernel: [70773.952157] 00000000 eddcec4d ca701d8c c1278546 ca701db0 c1047fd3 c15ebc78 edde1b0c Nov 26 21:47:54 localhost kernel: [70773.952169] 00000b2c eddcec4d 00000000 e35504c0 e5f17000 ca701dc0 c1048012 00000009 Nov 26 21:47:54 localhost kernel: [70773.952180] 00000000 ca701dd4 eddcec4d e3550000 ca701e00 ca701e00 ca701ddc eddceccf Nov 26 21:47:54 localhost kernel: [70773.952192] Call Trace: Nov 26 21:47:54 localhost kernel: [70773.952207] [<eddcec4d>] ? __igb_close+0x3d/0xb0 [igb] Nov 26 21:47:54 localhost kernel: [70773.952216] [<c1278546>] dump_stack+0x16/0x20 Nov 26 21:47:54 localhost kernel: [70773.952223] [<c1047fd3>] warn_slowpath_common+0x63/0x80 Nov 26 21:47:54 localhost kernel: [70773.952237] [<eddcec4d>] ? __igb_close+0x3d/0xb0 [igb] Nov 26 21:47:54 localhost kernel: [70773.952243] [<c1048012>] warn_slowpath_null+0x22/0x30 Nov 26 21:47:54 localhost kernel: [70773.952255] [<eddcec4d>] __igb_close+0x3d/0xb0 [igb] Nov 26 21:47:54 localhost kernel: [70773.952267] [<eddceccf>] igb_close+0xf/0x20 [igb] Nov 26 21:47:54 localhost kernel: [70773.952275] [<c13c8691>] __dev_close_many+0x91/0xb0 Nov 26 21:47:54 localhost kernel: [70773.952284] [<c13df583>] ? netpoll_rx_disable+0x43/0x50 Nov 26 21:47:54 localhost kernel: [70773.952289] [<c13c9163>] __dev_close+0x43/0x80 Nov 26 21:47:54 localhost kernel: [70773.952300] [<c13c7c28>] __dev_change_flags+0xa8/0x120 Nov 26 21:47:54 localhost kernel: [70773.952308] [<c13c85c3>] dev_change_flags+0x23/0x60 Nov 26 21:47:54 localhost kernel: [70773.952314] [<c1424d9c>] devinet_ioctl+0x29c/0x600 Nov 26 21:47:54 localhost kernel: [70773.952323] [<c13dbf05>] ? dev_ioctl+0x475/0x4d0 Nov 26 21:47:54 localhost kernel: [70773.952330] [<c1425d6b>] inet_ioctl+0x5b/0x80 Nov 26 21:47:54 localhost kernel: [70773.952340] [<c13b776e>] sock_ioctl+0x1fe/0x230 Nov 26 21:47:54 localhost kernel: [70773.952350] [<c13b7570>] ? sock_recvmsg_nosec+0xb0/0xb0 Nov 26 21:47:54 localhost kernel: [70773.952360] [<c1143cf6>] vfs_ioctl+0x26/0x40 Nov 26 21:47:54 localhost kernel: [70773.952367] [<c11448ba>] do_vfs_ioctl+0x4ea/0x550 Nov 26 21:47:54 localhost kernel: [70773.952376] [<c113de22>] ? final_putname+0x32/0x40 Nov 26 21:47:54 localhost kernel: [70773.952382] [<c113de22>] ? final_putname+0x32/0x40 Nov 26 21:47:54 localhost kernel: [70773.952391] [<c113de67>] ? putname+0x37/0x40 Nov 26 21:47:54 localhost kernel: [70773.952401] [<c1134b64>] ? do_sys_open+0x194/0x1a0 Nov 26 21:47:54 localhost kernel: [70773.952408] [<c1144983>] SyS_ioctl+0x63/0x90 Nov 26 21:47:54 localhost kernel: [70773.952416] [<c147ad4d>] sysenter_do_call+0x12/0x28 Nov 26 21:47:54 localhost kernel: [70773.952423] ---[ end trace d8c0d3f5c187aa6d ]--- Nov 26 21:47:54 localhost kernel: [70773.971294] igb 0000:04:00.1 eth1: Reset adapter Nov 26 21:47:54 localhost kernel: [70774.068154] igb 0000:04:00.0 eth0: Reset adapter Nov 26 21:47:55 localhost kernel: [70774.357949] igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX Nov 26 21:48:00 localhost kernel: [70779.231904] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX Nov 26 21:48:00 localhost kernel: [70779.346793] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX Nov 26 21:48:02 localhost kernel: [70781.214844] igb 0000:04:00.0: Detected Tx Unit Hang Nov 26 21:48:02 localhost kernel: [70781.214844] Tx Queue <7> Nov 26 21:48:02 localhost kernel: [70781.214844] TDH <0> Nov 26 21:48:02 localhost kernel: [70781.214844] TDT <0> Nov 26 21:48:02 localhost kernel: [70781.214844] next_to_use <1> Nov 26 21:48:02 localhost kernel: [70781.214844] next_to_clean <0> Nov 26 21:48:02 localhost kernel: [70781.214844] buffer_info[next_to_clean] Nov 26 21:48:02 localhost kernel: [70781.214844] time_stamp <10cc0cd> Nov 26 21:48:02 localhost kernel: [70781.214844] next_to_watch <e2d5e000> Nov 26 21:48:02 localhost kernel: [70781.214844] jiffies <10cc2ae> Nov 26 21:48:02 localhost kernel: [70781.214844] desc.status <12c000> Nov 26 21:48:04 localhost kernel: [70783.214857] igb 0000:04:00.0: Detected Tx Unit Hang Nov 26 21:48:04 localhost kernel: [70783.214857] Tx Queue <7> Nov 26 21:48:04 localhost kernel: [70783.214857] TDH <0> Nov 26 21:48:04 localhost kernel: [70783.214857] TDT <0> Nov 26 21:48:04 localhost kernel: [70783.214857] next_to_use <1> Nov 26 21:48:04 localhost kernel: [70783.214857] next_to_clean <0> Nov 26 21:48:04 localhost kernel: [70783.214857] buffer_info[next_to_clean] Nov 26 21:48:04 localhost kernel: [70783.214857] time_stamp <10cc0cd> Nov 26 21:48:04 localhost kernel: [70783.214857] next_to_watch <e2d5e000> Nov 26 21:48:04 localhost kernel: [70783.214857] jiffies <10cc4a2> Nov 26 21:48:04 localhost kernel: [70783.214857] desc.status <12c000> Nov 26 21:48:06 localhost kernel: [70785.214700] igb 0000:04:00.0: Detected Tx Unit Hang Nov 26 21:48:06 localhost kernel: [70785.214700] Tx Queue <7> Nov 26 21:48:06 localhost kernel: [70785.214700] TDH <0> Nov 26 21:48:06 localhost kernel: [70785.214700] TDT <0> Nov 26 21:48:06 localhost kernel: [70785.214700] next_to_use <1> Nov 26 21:48:06 localhost kernel: [70785.214700] next_to_clean <0> Nov 26 21:48:06 localhost kernel: [70785.214700] buffer_info[next_to_clean] Nov 26 21:48:06 localhost kernel: [70785.214700] time_stamp <10cc0cd> Nov 26 21:48:06 localhost kernel: [70785.214700] next_to_watch <e2d5e000> Nov 26 21:48:06 localhost kernel: [70785.214700] jiffies <10cc696> Nov 26 21:48:06 localhost kernel: [70785.214700] desc.status <12c000> Nov 26 21:48:08 localhost kernel: [70787.214734] igb 0000:04:00.0: Detected Tx Unit Hang Nov 26 21:48:08 localhost kernel: [70787.214734] Tx Queue <7> Nov 26 21:48:08 localhost kernel: [70787.214734] TDH <0> Nov 26 21:48:08 localhost kernel: [70787.214734] TDT <0> Nov 26 21:48:08 localhost kernel: [70787.214734] next_to_use <1> Nov 26 21:48:08 localhost kernel: [70787.214734] next_to_clean <0> Nov 26 21:48:08 localhost kernel: [70787.214734] buffer_info[next_to_clean] Nov 26 21:48:08 localhost kernel: [70787.214734] time_stamp <10cc0cd> Nov 26 21:48:08 localhost kernel: [70787.214734] next_to_watch <e2d5e000> Nov 26 21:48:08 localhost kernel: [70787.214734] jiffies <10cc88a> Nov 26 21:48:08 localhost kernel: [70787.214734] desc.status <12c000> Nov 26 21:48:10 localhost kernel: [70789.214752] igb 0000:04:00.0: Detected Tx Unit Hang Nov 26 21:48:10 localhost kernel: [70789.214752] Tx Queue <7> Nov 26 21:48:10 localhost kernel: [70789.214752] TDH <0> Nov 26 21:48:10 localhost kernel: [70789.214752] TDT <0> Nov 26 21:48:10 localhost kernel: [70789.214752] next_to_use <1> Nov 26 21:48:10 localhost kernel: [70789.214752] next_to_clean <0> Nov 26 21:48:10 localhost kernel: [70789.214752] buffer_info[next_to_clean] Nov 26 21:48:10 localhost kernel: [70789.214752] time_stamp <10cc0cd> Nov 26 21:48:10 localhost kernel: [70789.214752] next_to_watch <e2d5e000> Nov 26 21:48:10 localhost kernel: [70789.214752] jiffies <10cca7e> Nov 26 21:48:10 localhost kernel: [70789.214752] desc.status <12c000> Nov 26 21:48:11 localhost kernel: [70790.214611] igb 0000:04:00.0 eth0: Reset adapter Nov 26 21:48:11 localhost kernel: [70790.246610] igb 0000:04:00.1 eth1: Reset adapter Nov 26 21:48:11 localhost kernel: [70790.250616] igb: eth1 NIC Link is Down Nov 26 21:48:11 localhost kernel: [70790.340089] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX Nov 26 21:48:11 localhost kernel: [70790.367984] igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX Nov 26 21:48:11 localhost kernel: [70790.598550] igb: eth1 NIC Link is Down Nov 26 21:48:11 localhost kernel: [70790.634559] igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX Nov 26 21:48:11 localhost kernel: [70790.638593] igb: eth0 NIC Link is Down Nov 26 21:48:11 localhost kernel: [70790.674599] igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX On 30/01/14 19:08, Zoltan Kiss wrote: > I've experienced some queue timeout problems mentioned in the subject > with igb and bnx2 cards. I haven't seen them on other cards so far. I'm > using XenServer with 3.10 Dom0 kernel (however igb were already updated > to latest version), and there are Windows guests sending data through > these cards. I noticed these problems in XenRT test runs, and I know > that they usually mean some lost interrupt problem or other hardware > error, but in my case they started to appear more often, and they are > likely connected to my netback grant mapping patches. These patches > causing skb's with huge (~64kb) linear buffers to appear more often. > The reason for that is an old problem in the ring protocol: originally > the maximum amount of slots were linked to MAX_SKB_FRAGS, as every slot > ended up as a frag of the skb. When this value were changed, netback had > to cope with the situation by coalescing the packets into fewer frags. > My patch series take a different approach: the leftover slots (pages) > were assigned to a new skb's frags, and that skb were stashed to the > frag_list of the first one. Then, before sending it off to the stack it > calls skb = skb_copy_expand(skb, 0, 0, GFP_ATOMIC, __GFP_NOWARN), which > basically creates a new skb and copied all the data into it. As far as I > understood, it put everything into the linear buffer, which can amount > to 64KB at most. The original skb are freed then, and this new one were > sent to the stack. > I suspect that this is the problem as it only happens when guests send > too much slots. Does anyone familiar with these drivers have seen such > issue before? (when these kind of skb's get stucked in the queue) ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-02-12 17:14 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-01-30 19:08 igb and bnx2: "NETDEV WATCHDOG: transmit queue timed out" when skb has huge linear buffer Zoltan Kiss 2014-01-30 20:34 ` Michael Chan 2014-01-31 13:29 ` Zoltan Kiss 2014-02-04 19:47 ` Michael Chan 2014-02-05 20:23 ` Zoltan Kiss 2014-02-05 20:27 ` Zoltan Kiss 2014-02-05 20:43 ` Andrew Cooper 2014-02-06 9:58 ` Zoltan Kiss 2014-01-31 18:56 ` Wei Liu 2014-02-04 21:32 ` Zoltan Kiss 2014-02-12 17:13 ` Zoltan Kiss
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).