All of lore.kernel.org
 help / color / mirror / Atom feed
* e1000e hardware unit hangs
@ 2018-01-23 23:46 Ben Greear
  2018-01-24 16:11   ` [Intel-wired-lan] " Alexander Duyck
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Greear @ 2018-01-23 23:46 UTC (permalink / raw)
  To: netdev

Hello,

Anyone have any more suggestions for making e1000e work better?  This is from a 4.9.65+ kernel,
with these additional e1000e patches applied:

e1000e: Fix error path in link detection
e1000e: Fix wrong comment related to link detection
e1000e: Fix return value test
e1000e: Separate signaling for link check/link up
e1000e: Avoid receiver overrun interrupt bursts

Test case is simply to run 30000 tcp connections each trying to send 56Kbps of bi-directional
data between a pair of e1000e interfaces :)

No OOM related issues are seen on this kernel...similar test on 4.13 showed some OOM
issues, but I have not debugged that yet...


Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3 (e1000e): transmit queue 0 timed out, trans_start: 4294737199, wd-timeout: 5000 
jiffies: 4294745088 tx-queues: 1
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out, trans_start: 4294737200, wd-timeout: 5000 
jiffies: 4294745088 tx-queues: 1
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: ------------[ cut here ]------------
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: WARNING: CPU: 7 PID: 0 at /home/greearb/git/linux-4.9.dev.y/net/sched/sch_generic.c:322 dev_watchdog+0x267/0x270
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 cfg80211 bnep bluetooth 
macvlan wanlink(O) pktgen fuse corete...sunrpc ipmi_d
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           O    4.9.65+ #21
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: Hardware name: Supermicro X9SCI/X9SCA/X9SCI/X9SCA, BIOS 2.0b 09/17/2012
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  ffff88042fdc3df0 ffffffff8142d791 0000000000000000 0000000000000000
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  ffff88042fdc3e30 ffffffff8110f266 000001422fdc3e08 0000000000000000
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  0000000000001388 00000000fffc7d30 ffff880417d0c000 00000000fffc9c00
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: Call Trace:
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  <IRQ>
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8142d791>] dump_stack+0x63/0x82
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8110f266>] __warn+0xc6/0xe0
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8110f338>] warn_slowpath_null+0x18/0x20
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff817da497>] dev_watchdog+0x267/0x270
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff817da230>] ? qdisc_rcu_free+0x40/0x40
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8117bf70>] call_timer_fn+0x30/0x150
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff817da230>] ? qdisc_rcu_free+0x40/0x40
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8117c350>] run_timer_softirq+0x1f0/0x450
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81051021>] ? lapic_next_deadline+0x21/0x30
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8118a54d>] ? clockevents_program_event+0x7d/0x120
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81115101>] __do_softirq+0xc1/0x2c0
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81115461>] irq_exit+0xb1/0xc0
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81051c9d>] smp_apic_timer_interrupt+0x3d/0x50
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81895842>] apic_timer_interrupt+0x82/0x90
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  <EOI>
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81726e46>] ? cpuidle_enter_state+0x126/0x300
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff81727042>] cpuidle_enter+0x12/0x20
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff811521ce>] call_cpuidle+0x1e/0x40
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8115240a>] cpu_startup_entry+0x13a/0x220
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel:  [<ffffffff8104fbd9>] start_secondary+0x149/0x170
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: ---[ end trace 69e31de175b59d4f ]---
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:06:00.0 eth2: Reset adapter unexpectedly
Jan 23 15:38:59 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:07:00.0 eth3: Reset adapter unexpectedly
Jan 23 15:39:02 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 23 15:39:02 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:06:00.0 eth2: Detected Hardware Unit Hang:
                                                       TDH                  <a8>
                                                       TDT                  <f3>...
Jan 23 15:39:02 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 23 15:39:13 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3 (e1000e): transmit queue 0 timed out, trans_start: 4294748730, wd-timeout: 5000 
jiffies: 4294759424 tx-queues: 1
Jan 23 15:39:13 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out, trans_start: 4294748730, wd-timeout: 5000 
jiffies: 4294759424 tx-queues: 1
Jan 23 15:39:13 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:07:00.0 eth3: Reset adapter unexpectedly
Jan 23 15:39:13 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:06:00.0 eth2: Reset adapter unexpectedly
Jan 23 15:39:20 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 23 15:39:20 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 23 15:39:25 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out, trans_start: 4294766123, wd-timeout: 5000 
jiffies: 4294771200 tx-queues: 1
Jan 23 15:39:25 lf1003-e3v2-13100124-f20x64 kernel: NETDEV WATCHDOG: eth3 (e1000e): transmit queue 0 timed out, trans_start: 4294766125, wd-timeout: 5000 
jiffies: 4294771200 tx-queues: 1
Jan 23 15:39:25 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:06:00.0 eth2: Reset adapter unexpectedly
Jan 23 15:39:25 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:07:00.0 eth3: Reset adapter unexpectedly
Jan 23 15:39:28 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 23 15:39:28 lf1003-e3v2-13100124-f20x64 kernel: e1000e 0000:06:00.0 eth2: Detected Hardware Unit Hang:
                                                       TDH                  <c8>
                                                       TDT                  <f5>...
Jan 23 15:39:28 lf1003-e3v2-13100124-f20x64 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx


Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-01-25  8:29 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-23 23:46 e1000e hardware unit hangs Ben Greear
2018-01-24 16:11 ` Alexander Duyck
2018-01-24 16:11   ` [Intel-wired-lan] " Alexander Duyck
2018-01-24 16:34   ` Neftin, Sasha
2018-01-24 16:34     ` [Intel-wired-lan] " Neftin, Sasha
2018-01-24 18:31     ` Ben Greear
2018-01-24 18:31       ` [Intel-wired-lan] " Ben Greear
2018-01-24 18:38       ` Denys Fedoryshchenko
2018-01-24 18:38         ` [Intel-wired-lan] " Denys Fedoryshchenko
2018-01-24 18:41         ` Ben Greear
2018-01-24 18:41           ` [Intel-wired-lan] " Ben Greear
2018-01-25  8:29           ` Neftin, Sasha
2018-01-25  8:29             ` [Intel-wired-lan] " Neftin, Sasha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.