All of lore.kernel.org
 help / color / mirror / Atom feed
* NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
@ 2013-11-20 18:44 Nick Pegg
  2013-11-20 21:22 ` Bjorn Helgaas
  0 siblings, 1 reply; 9+ messages in thread
From: Nick Pegg @ 2013-11-20 18:44 UTC (permalink / raw)
  To: linux-kernel

Hello,

I've been seeing some servers hit a condition where they receive a
large number of packets (over 500,000 per second, for example) which
causes a kernel panic due to a null pointer dereference. I've included
the tracebacks below.

I have not been able to reproduce this in my lab, but out in the field
I've seen this happen with kernel versions 3.7.6 through 3.9.2 (e1000e
driver versions 2.1.4-k through 2.2.14-k), running with Intel 82574L
NICs.

I've seen previous posts to this mailing list suggesting that this is
a hardware issue (the mitigation being turning TSO/GSO off), however
those tracebacks didn't show the interface getting unexpectedly reset,
causing the null pointer dereference. Is this possibly a problem with
the e1000e driver where it's not gracefully handling the reset?

Let me know if more information is needed. And please CC me in replies
since I'm not subscribed to this list. Thanks!

-Nick


--------
Nov 16 07:03:19 rx ------------[ cut here ]------------
Nov 16 07:03:19 rx WARNING: at net/sched/sch_generic.c:255
dev_watchdog+0x25b/0x270()
Nov 16 07:03:19 rx Hardware name: X8DT6
Nov 16 07:03:19 rx NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Nov 16 07:03:19 rx Modules linked in: xt_u32 xt_physdev
ip6table_mangle ip6_tables ebt_comment ebt_set ebt_arp ebt_limit
ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding
ebtable_filter 8021q mrp e1000e ptp pps_core
Nov 16 07:03:19 rx Pid: 14268, comm: kworker/0:0 Not tainted 3.9.2-1 #1
Nov 16 07:03:19 rx Call Trace:
Nov 16 07:03:19 rx  <IRQ>  [<ffffffff8105070a>] warn_slowpath_common+0x7a/0xc0
Nov 16 07:03:19 rx  [<ffffffff810507f1>] warn_slowpath_fmt+0x41/0x50
Nov 16 07:03:19 rx  [<ffffffff8168e2ab>] dev_watchdog+0x25b/0x270
Nov 16 07:03:19 rx  [<ffffffff8168e050>] ? __netdev_watchdog_up+0x80/0x80
Nov 16 07:03:19 rx  [<ffffffff81060464>] call_timer_fn+0x44/0x120
Nov 16 07:03:19 rx  [<ffffffff81060a71>] run_timer_softirq+0x241/0x2b0
Nov 16 07:03:19 rx  [<ffffffff8168e050>] ? __netdev_watchdog_up+0x80/0x80
Nov 16 07:03:19 rx  [<ffffffff8105893f>] __do_softirq+0xef/0x270
Nov 16 07:03:19 rx  [<ffffffff81058bc5>] irq_exit+0xb5/0xc0
Nov 16 07:03:19 rx  [<ffffffff81435a8f>] xen_evtchn_do_upcall+0x2f/0x40
Nov 16 07:03:19 rx  [<ffffffff817a24fe>] xen_do_hypervisor_callback+0x1e/0x30
Nov 16 07:03:19 rx  <EOI>  [<ffffffff813b3722>] ? delay_tsc+0x32/0x80
Nov 16 07:03:19 rx  [<ffffffff813b373a>] ? delay_tsc+0x4a/0x80
Nov 16 07:03:19 rx  [<ffffffff813b36e8>] ? __const_udelay+0x28/0x30
Nov 16 07:03:19 rx  [<ffffffffa0025b0e>] ?
e1000e_read_phy_reg_mdic+0xce/0x120 [e1000e]
Nov 16 07:03:19 rx  [<ffffffffa0018bd0>] ?
e1000_get_hw_semaphore_82574+0x20/0x40 [e1000e]
Nov 16 07:03:19 rx  [<ffffffffa0027a75>] ?
e1000e_read_phy_reg_bm2+0x55/0xb0 [e1000e]
Nov 16 07:03:19 rx  [<ffffffffa00330d6>] ?
e1000e_flush_descriptors+0x96/0x270 [e1000e]
Nov 16 07:03:19 rx  [<ffffffffa00181b7>] ?
e1000_check_phy_82574+0x27/0x60 [e1000e]
Nov 16 07:03:19 rx  [<ffffffffa0034178>] ?
e1000_watchdog_task+0x648/0x830 [e1000e]
Nov 16 07:03:19 rx  [<ffffffff81797877>] ? __schedule+0x3a7/0x7c0
Nov 16 07:03:19 rx  [<ffffffff8106c74e>] ? process_one_work+0x16e/0x430
Nov 16 07:03:19 rx  [<ffffffff8106ea3c>] ? worker_thread+0x11c/0x410
Nov 16 07:03:19 rx  [<ffffffff8106e920>] ? manage_workers+0x360/0x360
Nov 16 07:03:19 rx  [<ffffffff810738f6>] ? kthread+0xc6/0xd0
Nov 16 07:03:19 rx  [<ffffffff81003869>] ? xen_end_context_switch+0x19/0x20
Nov 16 07:03:19 rx  [<ffffffff81073830>] ?
kthread_freezable_should_stop+0x70/0x70
Nov 16 07:03:19 rx  [<ffffffff817a10bc>] ? ret_from_fork+0x7c/0xb0
Nov 16 07:03:19 rx  [<ffffffff81073830>] ?
kthread_freezable_should_stop+0x70/0x70
Nov 16 07:03:19 rx ---[ end trace f896af6c9f44182d ]---
Nov 16 07:03:19 rx e1000e 0000:03:00.0 eth0: Reset adapter unexpectedly
Nov 16 07:03:19 rx BUG: unable to handle kernel NULL pointer
dereference at 00000000000000d0
Nov 16 07:03:19 rx IP: [<ffffffffa0031d51>]
e1000_clean_rx_irq+0x101/0x490 [e1000e]
Nov 16 07:03:19 rx PGD 4a6c3067 PUD 4f440067 PMD 0
Nov 16 07:03:19 rx Oops: 0000 [#1] SMP
Nov 16 07:03:19 rx Modules linked in: xt_u32 xt_physdev
ip6table_mangle ip6_tables ebt_comment ebt_set ebt_arp ebt_limit
ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding
ebtable_filter 8021q mrp e1000e ptp pps_core
Nov 16 07:03:19 rx CPU 0
Nov 16 07:03:19 rx Pid: 14268, comm: kworker/0:0 Tainted: G        W
 3.9.2-1 #1 Supermicro X8DT6/X8DT6
Nov 16 07:03:19 rx RIP: e030:[<ffffffffa0031d51>]
[<ffffffffa0031d51>] e1000_clean_rx_irq+0x101/0x490 [e1000e]
Nov 16 07:03:19 rx RSP: e02b:ffff88008ea03d98  EFLAGS: 00010202
Nov 16 07:03:19 rx RAX: 000000000000001a RBX: ffffc900115f9000 RCX:
ffff88008ea03e64
Nov 16 07:03:19 rx RDX: ffff88008ea03e64 RSI: ffff880002dc6e00 RDI:
ffff8800844026c0
Nov 16 07:03:19 rx RBP: ffff88008ea03e38 R08: 00000000000169e0 R09:
ffffea00000b7180
Nov 16 07:03:19 rx R10: 0000000000020063 R11: 0000000000000000 R12:
0000000000000000
Nov 16 07:03:19 rx R13: ffff880081fb6000 R14: ffff88008016c700 R15:
ffff880081fb6000
Nov 16 07:03:19 rx FS:  00007fde5acb8700(0000)
GS:ffff88008ea00000(0000) knlGS:0000000000000000
Nov 16 07:03:19 rx CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 16 07:03:19 rx CR2: 00000000000000d0 CR3: 0000000071e80000 CR4:
0000000000002660
Nov 16 07:03:19 rx DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
Nov 16 07:03:19 rx DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
Nov 16 07:03:19 rx Process kworker/0:0 (pid: 14268, threadinfo
ffff880045f8a000, task ffff88005c9c0000)
Nov 16 07:03:19 rx Stack:
Nov 16 07:03:19 rx  ffff880080220000 ffff880002dc6a00 000001d70000003c
0000001100020063
Nov 16 07:03:19 rx  000000408ea03e18 ffff880083d7d000 000005dc00000019
ffff88008016c000
Nov 16 07:03:19 rx  ffff880083d7d098 0000000000000000 ffffc900115f9000
ffff88008ea03e64
Nov 16 07:03:19 rx Call Trace:
Nov 16 07:03:19 rx  <IRQ>
Nov 16 07:03:19 rx  [<ffffffffa003354d>] e1000e_poll+0xbd/0x2d0 [e1000e]
Nov 16 07:03:19 rx  [<ffffffff81673d7f>] net_rx_action+0xaf/0x220
Nov 16 07:03:19 rx  [<ffffffff810c4e72>] ? handle_edge_irq+0x92/0x140
Nov 16 07:03:19 rx  [<ffffffff8105893f>] __do_softirq+0xef/0x270
Nov 16 07:03:19 rx  [<ffffffff81058bc5>] irq_exit+0xb5/0xc0
Nov 16 07:03:19 rx  [<ffffffff81435a8f>] xen_evtchn_do_upcall+0x2f/0x40
Nov 16 07:03:19 rx  [<ffffffff817a24fe>] xen_do_hypervisor_callback+0x1e/0x30
Nov 16 07:03:19 rx  <EOI>
Nov 16 07:03:19 rx  [<ffffffff81664200>] ? skb_release_data+0x80/0xf0
Nov 16 07:03:19 rx  [<ffffffff8113291b>] ? __slab_free+0x14b/0x380
Nov 16 07:03:19 rx  [<ffffffff810f9dcb>] ? __free_pages_ok+0x8b/0xa0
Nov 16 07:03:19 rx  [<ffffffff810f9df6>] ? free_compound_page+0x16/0x20
Nov 16 07:03:19 rx  [<ffffffff810fdf27>] ? __put_compound_page+0x17/0x20
Nov 16 07:03:19 rx  [<ffffffff81132dcc>] ? kmem_cache_free+0x14c/0x170
Nov 16 07:03:19 rx  [<ffffffff816642f2>] ? __kfree_skb+0x42/0xa0
Nov 16 07:03:19 rx  [<ffffffff816643bc>] ? consume_skb+0x2c/0x70
Nov 16 07:03:19 rx  [<ffffffffa00324d7>] ?
e1000_clean_rx_ring+0xb7/0x2c0 [e1000e]
Nov 16 07:03:19 rx  [<ffffffffa0034bec>] ? e1000e_down+0x10c/0x260 [e1000e]
Nov 16 07:03:19 rx  [<ffffffffa0037ff4>] ?
e1000e_reinit_locked+0x44/0x60 [e1000e]
Nov 16 07:03:19 rx  [<ffffffffa0038102>] ? e1000_reset_task+0x42/0x7b0 [e1000e]
Nov 16 07:03:19 rx  [<ffffffff8106145a>] ? mod_timer+0x11a/0x1f0
Nov 16 07:03:19 rx  [<ffffffffa00340a6>] ?
e1000_watchdog_task+0x576/0x830 [e1000e]
Nov 16 07:03:19 rx  [<ffffffff81797877>] ? __schedule+0x3a7/0x7c0
Nov 16 07:03:19 rx  [<ffffffff8106c74e>] ? process_one_work+0x16e/0x430
Nov 16 07:03:19 rx  [<ffffffff8106ea3c>] ? worker_thread+0x11c/0x410
Nov 16 07:03:19 rx  [<ffffffff8106e920>] ? manage_workers+0x360/0x360
Nov 16 07:03:19 rx  [<ffffffff810738f6>] ? kthread+0xc6/0xd0
Nov 16 07:03:19 rx  [<ffffffff81003869>] ? xen_end_context_switch+0x19/0x20
Nov 16 07:03:19 rx  [<ffffffff81073830>] ?
kthread_freezable_should_stop+0x70/0x70
Nov 16 07:03:19 rx  [<ffffffff817a10bc>] ? ret_from_fork+0x7c/0xb0
Nov 16 07:03:19 rx  [<ffffffff81073830>] ?
kthread_freezable_should_stop+0x70/0x70
Nov 16 07:03:19 rx Code: 3b 45 84 0f 8d f1 01 00 00 48 8b 5d b0 4d 89
fd 48 03 5d a8 48 8b 4d b8 ff c0 89 01 0f ae e8 4c 8b 63 08 48 c7 43
08 00 00 00 00 <49> 8b 84 24 d0 00 00 00 48 8b 75 c0 ff 45 cc 0f 18 08
8b 45 cc
Nov 16 07:03:19 rx RIP  [<ffffffffa0031d51>]
e1000_clean_rx_irq+0x101/0x490 [e1000e]
Nov 16 07:03:19 rx  RSP <ffff88008ea03d98>
Nov 16 07:03:19 rx CR2: 00000000000000d0
Nov 16 07:03:19 rx ---[ end trace f896af6c9f44182e ]---
Nov 16 07:03:19 rx Kernel panic - not syncing: Fatal exception in interrupt
--------

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
  2013-11-20 18:44 NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Nick Pegg
@ 2013-11-20 21:22 ` Bjorn Helgaas
  2013-12-02 22:57   ` Nick Pegg
  2013-12-03  3:51   ` Ethan Zhao
  0 siblings, 2 replies; 9+ messages in thread
From: Bjorn Helgaas @ 2013-11-20 21:22 UTC (permalink / raw)
  To: Nick Pegg; +Cc: linux-kernel, e1000-devel

[+cc e1000-devel]

On Wed, Nov 20, 2013 at 11:44 AM, Nick Pegg <nick@nickpegg.com> wrote:
> Hello,
>
> I've been seeing some servers hit a condition where they receive a
> large number of packets (over 500,000 per second, for example) which
> causes a kernel panic due to a null pointer dereference. I've included
> the tracebacks below.
>
> I have not been able to reproduce this in my lab, but out in the field
> I've seen this happen with kernel versions 3.7.6 through 3.9.2 (e1000e
> driver versions 2.1.4-k through 2.2.14-k), running with Intel 82574L
> NICs.
>
> I've seen previous posts to this mailing list suggesting that this is
> a hardware issue (the mitigation being turning TSO/GSO off), however
> those tracebacks didn't show the interface getting unexpectedly reset,
> causing the null pointer dereference. Is this possibly a problem with
> the e1000e driver where it's not gracefully handling the reset?
>
> Let me know if more information is needed. And please CC me in replies
> since I'm not subscribed to this list. Thanks!

Intel maintains newer drivers out-of-tree at
http://sourceforge.net/projects/e1000/, and it's possible this is some
bug that has already been fixed.  The current version there looks like
e1000e-2.5.4, released 2013-09-05.

Possible similar report: http://sourceforge.net/p/e1000/bugs/367/ (no
real data there).

> --------
> Nov 16 07:03:19 rx ------------[ cut here ]------------
> Nov 16 07:03:19 rx WARNING: at net/sched/sch_generic.c:255
> dev_watchdog+0x25b/0x270()
> Nov 16 07:03:19 rx Hardware name: X8DT6
> Nov 16 07:03:19 rx NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
> Nov 16 07:03:19 rx Modules linked in: xt_u32 xt_physdev
> ip6table_mangle ip6_tables ebt_comment ebt_set ebt_arp ebt_limit
> ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding
> ebtable_filter 8021q mrp e1000e ptp pps_core
> Nov 16 07:03:19 rx Pid: 14268, comm: kworker/0:0 Not tainted 3.9.2-1 #1
> Nov 16 07:03:19 rx Call Trace:
> Nov 16 07:03:19 rx  <IRQ>  [<ffffffff8105070a>] warn_slowpath_common+0x7a/0xc0
> Nov 16 07:03:19 rx  [<ffffffff810507f1>] warn_slowpath_fmt+0x41/0x50
> Nov 16 07:03:19 rx  [<ffffffff8168e2ab>] dev_watchdog+0x25b/0x270
> Nov 16 07:03:19 rx  [<ffffffff8168e050>] ? __netdev_watchdog_up+0x80/0x80
> Nov 16 07:03:19 rx  [<ffffffff81060464>] call_timer_fn+0x44/0x120
> Nov 16 07:03:19 rx  [<ffffffff81060a71>] run_timer_softirq+0x241/0x2b0
> Nov 16 07:03:19 rx  [<ffffffff8168e050>] ? __netdev_watchdog_up+0x80/0x80
> Nov 16 07:03:19 rx  [<ffffffff8105893f>] __do_softirq+0xef/0x270
> Nov 16 07:03:19 rx  [<ffffffff81058bc5>] irq_exit+0xb5/0xc0
> Nov 16 07:03:19 rx  [<ffffffff81435a8f>] xen_evtchn_do_upcall+0x2f/0x40
> Nov 16 07:03:19 rx  [<ffffffff817a24fe>] xen_do_hypervisor_callback+0x1e/0x30
> Nov 16 07:03:19 rx  <EOI>  [<ffffffff813b3722>] ? delay_tsc+0x32/0x80
> Nov 16 07:03:19 rx  [<ffffffff813b373a>] ? delay_tsc+0x4a/0x80
> Nov 16 07:03:19 rx  [<ffffffff813b36e8>] ? __const_udelay+0x28/0x30
> Nov 16 07:03:19 rx  [<ffffffffa0025b0e>] ?
> e1000e_read_phy_reg_mdic+0xce/0x120 [e1000e]
> Nov 16 07:03:19 rx  [<ffffffffa0018bd0>] ?
> e1000_get_hw_semaphore_82574+0x20/0x40 [e1000e]
> Nov 16 07:03:19 rx  [<ffffffffa0027a75>] ?
> e1000e_read_phy_reg_bm2+0x55/0xb0 [e1000e]
> Nov 16 07:03:19 rx  [<ffffffffa00330d6>] ?
> e1000e_flush_descriptors+0x96/0x270 [e1000e]
> Nov 16 07:03:19 rx  [<ffffffffa00181b7>] ?
> e1000_check_phy_82574+0x27/0x60 [e1000e]
> Nov 16 07:03:19 rx  [<ffffffffa0034178>] ?
> e1000_watchdog_task+0x648/0x830 [e1000e]
> Nov 16 07:03:19 rx  [<ffffffff81797877>] ? __schedule+0x3a7/0x7c0
> Nov 16 07:03:19 rx  [<ffffffff8106c74e>] ? process_one_work+0x16e/0x430
> Nov 16 07:03:19 rx  [<ffffffff8106ea3c>] ? worker_thread+0x11c/0x410
> Nov 16 07:03:19 rx  [<ffffffff8106e920>] ? manage_workers+0x360/0x360
> Nov 16 07:03:19 rx  [<ffffffff810738f6>] ? kthread+0xc6/0xd0
> Nov 16 07:03:19 rx  [<ffffffff81003869>] ? xen_end_context_switch+0x19/0x20
> Nov 16 07:03:19 rx  [<ffffffff81073830>] ?
> kthread_freezable_should_stop+0x70/0x70
> Nov 16 07:03:19 rx  [<ffffffff817a10bc>] ? ret_from_fork+0x7c/0xb0
> Nov 16 07:03:19 rx  [<ffffffff81073830>] ?
> kthread_freezable_should_stop+0x70/0x70
> Nov 16 07:03:19 rx ---[ end trace f896af6c9f44182d ]---
> Nov 16 07:03:19 rx e1000e 0000:03:00.0 eth0: Reset adapter unexpectedly
> Nov 16 07:03:19 rx BUG: unable to handle kernel NULL pointer
> dereference at 00000000000000d0
> Nov 16 07:03:19 rx IP: [<ffffffffa0031d51>]
> e1000_clean_rx_irq+0x101/0x490 [e1000e]
> Nov 16 07:03:19 rx PGD 4a6c3067 PUD 4f440067 PMD 0
> Nov 16 07:03:19 rx Oops: 0000 [#1] SMP
> Nov 16 07:03:19 rx Modules linked in: xt_u32 xt_physdev
> ip6table_mangle ip6_tables ebt_comment ebt_set ebt_arp ebt_limit
> ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding
> ebtable_filter 8021q mrp e1000e ptp pps_core
> Nov 16 07:03:19 rx CPU 0
> Nov 16 07:03:19 rx Pid: 14268, comm: kworker/0:0 Tainted: G        W
>  3.9.2-1 #1 Supermicro X8DT6/X8DT6
> Nov 16 07:03:19 rx RIP: e030:[<ffffffffa0031d51>]
> [<ffffffffa0031d51>] e1000_clean_rx_irq+0x101/0x490 [e1000e]
> Nov 16 07:03:19 rx RSP: e02b:ffff88008ea03d98  EFLAGS: 00010202
> Nov 16 07:03:19 rx RAX: 000000000000001a RBX: ffffc900115f9000 RCX:
> ffff88008ea03e64
> Nov 16 07:03:19 rx RDX: ffff88008ea03e64 RSI: ffff880002dc6e00 RDI:
> ffff8800844026c0
> Nov 16 07:03:19 rx RBP: ffff88008ea03e38 R08: 00000000000169e0 R09:
> ffffea00000b7180
> Nov 16 07:03:19 rx R10: 0000000000020063 R11: 0000000000000000 R12:
> 0000000000000000
> Nov 16 07:03:19 rx R13: ffff880081fb6000 R14: ffff88008016c700 R15:
> ffff880081fb6000
> Nov 16 07:03:19 rx FS:  00007fde5acb8700(0000)
> GS:ffff88008ea00000(0000) knlGS:0000000000000000
> Nov 16 07:03:19 rx CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> Nov 16 07:03:19 rx CR2: 00000000000000d0 CR3: 0000000071e80000 CR4:
> 0000000000002660
> Nov 16 07:03:19 rx DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> Nov 16 07:03:19 rx DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> Nov 16 07:03:19 rx Process kworker/0:0 (pid: 14268, threadinfo
> ffff880045f8a000, task ffff88005c9c0000)
> Nov 16 07:03:19 rx Stack:
> Nov 16 07:03:19 rx  ffff880080220000 ffff880002dc6a00 000001d70000003c
> 0000001100020063
> Nov 16 07:03:19 rx  000000408ea03e18 ffff880083d7d000 000005dc00000019
> ffff88008016c000
> Nov 16 07:03:19 rx  ffff880083d7d098 0000000000000000 ffffc900115f9000
> ffff88008ea03e64
> Nov 16 07:03:19 rx Call Trace:
> Nov 16 07:03:19 rx  <IRQ>
> Nov 16 07:03:19 rx  [<ffffffffa003354d>] e1000e_poll+0xbd/0x2d0 [e1000e]
> Nov 16 07:03:19 rx  [<ffffffff81673d7f>] net_rx_action+0xaf/0x220
> Nov 16 07:03:19 rx  [<ffffffff810c4e72>] ? handle_edge_irq+0x92/0x140
> Nov 16 07:03:19 rx  [<ffffffff8105893f>] __do_softirq+0xef/0x270
> Nov 16 07:03:19 rx  [<ffffffff81058bc5>] irq_exit+0xb5/0xc0
> Nov 16 07:03:19 rx  [<ffffffff81435a8f>] xen_evtchn_do_upcall+0x2f/0x40
> Nov 16 07:03:19 rx  [<ffffffff817a24fe>] xen_do_hypervisor_callback+0x1e/0x30
> Nov 16 07:03:19 rx  <EOI>
> Nov 16 07:03:19 rx  [<ffffffff81664200>] ? skb_release_data+0x80/0xf0
> Nov 16 07:03:19 rx  [<ffffffff8113291b>] ? __slab_free+0x14b/0x380
> Nov 16 07:03:19 rx  [<ffffffff810f9dcb>] ? __free_pages_ok+0x8b/0xa0
> Nov 16 07:03:19 rx  [<ffffffff810f9df6>] ? free_compound_page+0x16/0x20
> Nov 16 07:03:19 rx  [<ffffffff810fdf27>] ? __put_compound_page+0x17/0x20
> Nov 16 07:03:19 rx  [<ffffffff81132dcc>] ? kmem_cache_free+0x14c/0x170
> Nov 16 07:03:19 rx  [<ffffffff816642f2>] ? __kfree_skb+0x42/0xa0
> Nov 16 07:03:19 rx  [<ffffffff816643bc>] ? consume_skb+0x2c/0x70
> Nov 16 07:03:19 rx  [<ffffffffa00324d7>] ?
> e1000_clean_rx_ring+0xb7/0x2c0 [e1000e]
> Nov 16 07:03:19 rx  [<ffffffffa0034bec>] ? e1000e_down+0x10c/0x260 [e1000e]
> Nov 16 07:03:19 rx  [<ffffffffa0037ff4>] ?
> e1000e_reinit_locked+0x44/0x60 [e1000e]
> Nov 16 07:03:19 rx  [<ffffffffa0038102>] ? e1000_reset_task+0x42/0x7b0 [e1000e]
> Nov 16 07:03:19 rx  [<ffffffff8106145a>] ? mod_timer+0x11a/0x1f0
> Nov 16 07:03:19 rx  [<ffffffffa00340a6>] ?
> e1000_watchdog_task+0x576/0x830 [e1000e]
> Nov 16 07:03:19 rx  [<ffffffff81797877>] ? __schedule+0x3a7/0x7c0
> Nov 16 07:03:19 rx  [<ffffffff8106c74e>] ? process_one_work+0x16e/0x430
> Nov 16 07:03:19 rx  [<ffffffff8106ea3c>] ? worker_thread+0x11c/0x410
> Nov 16 07:03:19 rx  [<ffffffff8106e920>] ? manage_workers+0x360/0x360
> Nov 16 07:03:19 rx  [<ffffffff810738f6>] ? kthread+0xc6/0xd0
> Nov 16 07:03:19 rx  [<ffffffff81003869>] ? xen_end_context_switch+0x19/0x20
> Nov 16 07:03:19 rx  [<ffffffff81073830>] ?
> kthread_freezable_should_stop+0x70/0x70
> Nov 16 07:03:19 rx  [<ffffffff817a10bc>] ? ret_from_fork+0x7c/0xb0
> Nov 16 07:03:19 rx  [<ffffffff81073830>] ?
> kthread_freezable_should_stop+0x70/0x70
> Nov 16 07:03:19 rx Code: 3b 45 84 0f 8d f1 01 00 00 48 8b 5d b0 4d 89
> fd 48 03 5d a8 48 8b 4d b8 ff c0 89 01 0f ae e8 4c 8b 63 08 48 c7 43
> 08 00 00 00 00 <49> 8b 84 24 d0 00 00 00 48 8b 75 c0 ff 45 cc 0f 18 08
> 8b 45 cc
> Nov 16 07:03:19 rx RIP  [<ffffffffa0031d51>]
> e1000_clean_rx_irq+0x101/0x490 [e1000e]
> Nov 16 07:03:19 rx  RSP <ffff88008ea03d98>
> Nov 16 07:03:19 rx CR2: 00000000000000d0
> Nov 16 07:03:19 rx ---[ end trace f896af6c9f44182e ]---
> Nov 16 07:03:19 rx Kernel panic - not syncing: Fatal exception in interrupt
> --------
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
  2013-11-20 21:22 ` Bjorn Helgaas
@ 2013-12-02 22:57   ` Nick Pegg
  2013-12-02 23:34     ` [E1000-devel] " Fujinaka, Todd
  2013-12-03  3:51   ` Ethan Zhao
  1 sibling, 1 reply; 9+ messages in thread
From: Nick Pegg @ 2013-12-02 22:57 UTC (permalink / raw)
  To: linux-kernel, e1000-devel

> Intel maintains newer drivers out-of-tree at
> http://sourceforge.net/projects/e1000/, and it's possible this is some
> bug that has already been fixed.  The current version there looks like
> e1000e-2.5.4, released 2013-09-05.
>
> Possible similar report: http://sourceforge.net/p/e1000/bugs/367/ (no
> real data there).

I've looked through the existing bug reports and version changelogs
and didn't see anything that seemed very relevant.

I was able to debug the e1000e object file and get the specific code
that's bugging out after the interface is unexpectedly reset:
--------
(gdb) l *e1000_clean_rx_irq+0x101
0x19d81 is in e1000_clean_rx_irq
(drivers/net/ethernet/intel/e1000e/netdev.c:933).
928                     rmb();  /* read descriptor and rx_buffer_info
after status DD */
929
930                     skb = buffer_info->skb;
931                     buffer_info->skb = NULL;
932
933                     prefetch(skb->data - NET_IP_ALIGN);
934
935                     i++;
936                     if (i == rx_ring->count)
937                             i = 0;
--------

The above code is from kernel version 3.9.2 and e1000e driver version
2.2.14-k. Should there be a check here to see if skb is NULL? I
checked the latest e1000e release (2.5.4) and there is no check there
either (near netdev.c:994).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [E1000-devel] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
  2013-12-02 22:57   ` Nick Pegg
@ 2013-12-02 23:34     ` Fujinaka, Todd
  0 siblings, 0 replies; 9+ messages in thread
From: Fujinaka, Todd @ 2013-12-02 23:34 UTC (permalink / raw)
  To: Nick Pegg, linux-kernel, e1000-devel

I'm having difficulty following this issue, most likely because of our email system. Can you file a new bug on sourceforge?

Thanks.

Todd Fujinaka
Software Application Engineer
Networking Division (ND)
Intel Corporation
todd.fujinaka@intel.com
(503) 712-4565


-----Original Message-----
From: Nick Pegg [mailto:nick@nickpegg.com] 
Sent: Monday, December 02, 2013 2:57 PM
To: linux-kernel@vger.kernel.org; e1000-devel@lists.sourceforge.net
Subject: Re: [E1000-devel] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out

> Intel maintains newer drivers out-of-tree at 
> http://sourceforge.net/projects/e1000/, and it's possible this is some 
> bug that has already been fixed.  The current version there looks like 
> e1000e-2.5.4, released 2013-09-05.
>
> Possible similar report: http://sourceforge.net/p/e1000/bugs/367/ (no 
> real data there).

I've looked through the existing bug reports and version changelogs and didn't see anything that seemed very relevant.

I was able to debug the e1000e object file and get the specific code that's bugging out after the interface is unexpectedly reset:
--------
(gdb) l *e1000_clean_rx_irq+0x101
0x19d81 is in e1000_clean_rx_irq
(drivers/net/ethernet/intel/e1000e/netdev.c:933).
928                     rmb();  /* read descriptor and rx_buffer_info
after status DD */
929
930                     skb = buffer_info->skb;
931                     buffer_info->skb = NULL;
932
933                     prefetch(skb->data - NET_IP_ALIGN);
934
935                     i++;
936                     if (i == rx_ring->count)
937                             i = 0;
--------

The above code is from kernel version 3.9.2 and e1000e driver version 2.2.14-k. Should there be a check here to see if skb is NULL? I checked the latest e1000e release (2.5.4) and there is no check there either (near netdev.c:994).

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
  2013-11-20 21:22 ` Bjorn Helgaas
  2013-12-02 22:57   ` Nick Pegg
@ 2013-12-03  3:51   ` Ethan Zhao
  2013-12-03 13:33     ` Nick Pegg
  1 sibling, 1 reply; 9+ messages in thread
From: Ethan Zhao @ 2013-12-03  3:51 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Nick Pegg, linux-kernel, e1000-devel, Jin Feng, konrad.wilk

Bjorn,
   Seems not the same bug as  http://sourceforge.net/p/e1000/bugs/367/
,  Nick is not running his kernel on bare metal, per the error log,
he runs his kernel as HVM DomU guest or Dom0 on XEN ?  so just a check
of NULL will not fix that.

Thanks,
Ethan

On Thu, Nov 21, 2013 at 5:22 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc e1000-devel]
>
> On Wed, Nov 20, 2013 at 11:44 AM, Nick Pegg <nick@nickpegg.com> wrote:
>> Hello,
>>
>> I've been seeing some servers hit a condition where they receive a
>> large number of packets (over 500,000 per second, for example) which
>> causes a kernel panic due to a null pointer dereference. I've included
>> the tracebacks below.
>>
>> I have not been able to reproduce this in my lab, but out in the field
>> I've seen this happen with kernel versions 3.7.6 through 3.9.2 (e1000e
>> driver versions 2.1.4-k through 2.2.14-k), running with Intel 82574L
>> NICs.
>>
>> I've seen previous posts to this mailing list suggesting that this is
>> a hardware issue (the mitigation being turning TSO/GSO off), however
>> those tracebacks didn't show the interface getting unexpectedly reset,
>> causing the null pointer dereference. Is this possibly a problem with
>> the e1000e driver where it's not gracefully handling the reset?
>>
>> Let me know if more information is needed. And please CC me in replies
>> since I'm not subscribed to this list. Thanks!
>
> Intel maintains newer drivers out-of-tree at
> http://sourceforge.net/projects/e1000/, and it's possible this is some
> bug that has already been fixed.  The current version there looks like
> e1000e-2.5.4, released 2013-09-05.
>
> Possible similar report: http://sourceforge.net/p/e1000/bugs/367/ (no
> real data there).
>
>> --------
>> Nov 16 07:03:19 rx ------------[ cut here ]------------
>> Nov 16 07:03:19 rx WARNING: at net/sched/sch_generic.c:255
>> dev_watchdog+0x25b/0x270()
>> Nov 16 07:03:19 rx Hardware name: X8DT6
>> Nov 16 07:03:19 rx NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
>> Nov 16 07:03:19 rx Modules linked in: xt_u32 xt_physdev
>> ip6table_mangle ip6_tables ebt_comment ebt_set ebt_arp ebt_limit
>> ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding
>> ebtable_filter 8021q mrp e1000e ptp pps_core
>> Nov 16 07:03:19 rx Pid: 14268, comm: kworker/0:0 Not tainted 3.9.2-1 #1
>> Nov 16 07:03:19 rx Call Trace:
>> Nov 16 07:03:19 rx  <IRQ>  [<ffffffff8105070a>] warn_slowpath_common+0x7a/0xc0
>> Nov 16 07:03:19 rx  [<ffffffff810507f1>] warn_slowpath_fmt+0x41/0x50
>> Nov 16 07:03:19 rx  [<ffffffff8168e2ab>] dev_watchdog+0x25b/0x270
>> Nov 16 07:03:19 rx  [<ffffffff8168e050>] ? __netdev_watchdog_up+0x80/0x80
>> Nov 16 07:03:19 rx  [<ffffffff81060464>] call_timer_fn+0x44/0x120
>> Nov 16 07:03:19 rx  [<ffffffff81060a71>] run_timer_softirq+0x241/0x2b0
>> Nov 16 07:03:19 rx  [<ffffffff8168e050>] ? __netdev_watchdog_up+0x80/0x80
>> Nov 16 07:03:19 rx  [<ffffffff8105893f>] __do_softirq+0xef/0x270
>> Nov 16 07:03:19 rx  [<ffffffff81058bc5>] irq_exit+0xb5/0xc0
>> Nov 16 07:03:19 rx  [<ffffffff81435a8f>] xen_evtchn_do_upcall+0x2f/0x40
>> Nov 16 07:03:19 rx  [<ffffffff817a24fe>] xen_do_hypervisor_callback+0x1e/0x30
>> Nov 16 07:03:19 rx  <EOI>  [<ffffffff813b3722>] ? delay_tsc+0x32/0x80
>> Nov 16 07:03:19 rx  [<ffffffff813b373a>] ? delay_tsc+0x4a/0x80
>> Nov 16 07:03:19 rx  [<ffffffff813b36e8>] ? __const_udelay+0x28/0x30
>> Nov 16 07:03:19 rx  [<ffffffffa0025b0e>] ?
>> e1000e_read_phy_reg_mdic+0xce/0x120 [e1000e]
>> Nov 16 07:03:19 rx  [<ffffffffa0018bd0>] ?
>> e1000_get_hw_semaphore_82574+0x20/0x40 [e1000e]
>> Nov 16 07:03:19 rx  [<ffffffffa0027a75>] ?
>> e1000e_read_phy_reg_bm2+0x55/0xb0 [e1000e]
>> Nov 16 07:03:19 rx  [<ffffffffa00330d6>] ?
>> e1000e_flush_descriptors+0x96/0x270 [e1000e]
>> Nov 16 07:03:19 rx  [<ffffffffa00181b7>] ?
>> e1000_check_phy_82574+0x27/0x60 [e1000e]
>> Nov 16 07:03:19 rx  [<ffffffffa0034178>] ?
>> e1000_watchdog_task+0x648/0x830 [e1000e]
>> Nov 16 07:03:19 rx  [<ffffffff81797877>] ? __schedule+0x3a7/0x7c0
>> Nov 16 07:03:19 rx  [<ffffffff8106c74e>] ? process_one_work+0x16e/0x430
>> Nov 16 07:03:19 rx  [<ffffffff8106ea3c>] ? worker_thread+0x11c/0x410
>> Nov 16 07:03:19 rx  [<ffffffff8106e920>] ? manage_workers+0x360/0x360
>> Nov 16 07:03:19 rx  [<ffffffff810738f6>] ? kthread+0xc6/0xd0
>> Nov 16 07:03:19 rx  [<ffffffff81003869>] ? xen_end_context_switch+0x19/0x20
>> Nov 16 07:03:19 rx  [<ffffffff81073830>] ?
>> kthread_freezable_should_stop+0x70/0x70
>> Nov 16 07:03:19 rx  [<ffffffff817a10bc>] ? ret_from_fork+0x7c/0xb0
>> Nov 16 07:03:19 rx  [<ffffffff81073830>] ?
>> kthread_freezable_should_stop+0x70/0x70
>> Nov 16 07:03:19 rx ---[ end trace f896af6c9f44182d ]---
>> Nov 16 07:03:19 rx e1000e 0000:03:00.0 eth0: Reset adapter unexpectedly
>> Nov 16 07:03:19 rx BUG: unable to handle kernel NULL pointer
>> dereference at 00000000000000d0
>> Nov 16 07:03:19 rx IP: [<ffffffffa0031d51>]
>> e1000_clean_rx_irq+0x101/0x490 [e1000e]
>> Nov 16 07:03:19 rx PGD 4a6c3067 PUD 4f440067 PMD 0
>> Nov 16 07:03:19 rx Oops: 0000 [#1] SMP
>> Nov 16 07:03:19 rx Modules linked in: xt_u32 xt_physdev
>> ip6table_mangle ip6_tables ebt_comment ebt_set ebt_arp ebt_limit
>> ebt_ip6 ebt_ip ip_set_hash_net ip_set ebtable_nat xen_gntdev bonding
>> ebtable_filter 8021q mrp e1000e ptp pps_core
>> Nov 16 07:03:19 rx CPU 0
>> Nov 16 07:03:19 rx Pid: 14268, comm: kworker/0:0 Tainted: G        W
>>  3.9.2-1 #1 Supermicro X8DT6/X8DT6
>> Nov 16 07:03:19 rx RIP: e030:[<ffffffffa0031d51>]
>> [<ffffffffa0031d51>] e1000_clean_rx_irq+0x101/0x490 [e1000e]
>> Nov 16 07:03:19 rx RSP: e02b:ffff88008ea03d98  EFLAGS: 00010202
>> Nov 16 07:03:19 rx RAX: 000000000000001a RBX: ffffc900115f9000 RCX:
>> ffff88008ea03e64
>> Nov 16 07:03:19 rx RDX: ffff88008ea03e64 RSI: ffff880002dc6e00 RDI:
>> ffff8800844026c0
>> Nov 16 07:03:19 rx RBP: ffff88008ea03e38 R08: 00000000000169e0 R09:
>> ffffea00000b7180
>> Nov 16 07:03:19 rx R10: 0000000000020063 R11: 0000000000000000 R12:
>> 0000000000000000
>> Nov 16 07:03:19 rx R13: ffff880081fb6000 R14: ffff88008016c700 R15:
>> ffff880081fb6000
>> Nov 16 07:03:19 rx FS:  00007fde5acb8700(0000)
>> GS:ffff88008ea00000(0000) knlGS:0000000000000000
>> Nov 16 07:03:19 rx CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
>> Nov 16 07:03:19 rx CR2: 00000000000000d0 CR3: 0000000071e80000 CR4:
>> 0000000000002660
>> Nov 16 07:03:19 rx DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> Nov 16 07:03:19 rx DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> Nov 16 07:03:19 rx Process kworker/0:0 (pid: 14268, threadinfo
>> ffff880045f8a000, task ffff88005c9c0000)
>> Nov 16 07:03:19 rx Stack:
>> Nov 16 07:03:19 rx  ffff880080220000 ffff880002dc6a00 000001d70000003c
>> 0000001100020063
>> Nov 16 07:03:19 rx  000000408ea03e18 ffff880083d7d000 000005dc00000019
>> ffff88008016c000
>> Nov 16 07:03:19 rx  ffff880083d7d098 0000000000000000 ffffc900115f9000
>> ffff88008ea03e64
>> Nov 16 07:03:19 rx Call Trace:
>> Nov 16 07:03:19 rx  <IRQ>
>> Nov 16 07:03:19 rx  [<ffffffffa003354d>] e1000e_poll+0xbd/0x2d0 [e1000e]
>> Nov 16 07:03:19 rx  [<ffffffff81673d7f>] net_rx_action+0xaf/0x220
>> Nov 16 07:03:19 rx  [<ffffffff810c4e72>] ? handle_edge_irq+0x92/0x140
>> Nov 16 07:03:19 rx  [<ffffffff8105893f>] __do_softirq+0xef/0x270
>> Nov 16 07:03:19 rx  [<ffffffff81058bc5>] irq_exit+0xb5/0xc0
>> Nov 16 07:03:19 rx  [<ffffffff81435a8f>] xen_evtchn_do_upcall+0x2f/0x40
>> Nov 16 07:03:19 rx  [<ffffffff817a24fe>] xen_do_hypervisor_callback+0x1e/0x30
>> Nov 16 07:03:19 rx  <EOI>
>> Nov 16 07:03:19 rx  [<ffffffff81664200>] ? skb_release_data+0x80/0xf0
>> Nov 16 07:03:19 rx  [<ffffffff8113291b>] ? __slab_free+0x14b/0x380
>> Nov 16 07:03:19 rx  [<ffffffff810f9dcb>] ? __free_pages_ok+0x8b/0xa0
>> Nov 16 07:03:19 rx  [<ffffffff810f9df6>] ? free_compound_page+0x16/0x20
>> Nov 16 07:03:19 rx  [<ffffffff810fdf27>] ? __put_compound_page+0x17/0x20
>> Nov 16 07:03:19 rx  [<ffffffff81132dcc>] ? kmem_cache_free+0x14c/0x170
>> Nov 16 07:03:19 rx  [<ffffffff816642f2>] ? __kfree_skb+0x42/0xa0
>> Nov 16 07:03:19 rx  [<ffffffff816643bc>] ? consume_skb+0x2c/0x70
>> Nov 16 07:03:19 rx  [<ffffffffa00324d7>] ?
>> e1000_clean_rx_ring+0xb7/0x2c0 [e1000e]
>> Nov 16 07:03:19 rx  [<ffffffffa0034bec>] ? e1000e_down+0x10c/0x260 [e1000e]
>> Nov 16 07:03:19 rx  [<ffffffffa0037ff4>] ?
>> e1000e_reinit_locked+0x44/0x60 [e1000e]
>> Nov 16 07:03:19 rx  [<ffffffffa0038102>] ? e1000_reset_task+0x42/0x7b0 [e1000e]
>> Nov 16 07:03:19 rx  [<ffffffff8106145a>] ? mod_timer+0x11a/0x1f0
>> Nov 16 07:03:19 rx  [<ffffffffa00340a6>] ?
>> e1000_watchdog_task+0x576/0x830 [e1000e]
>> Nov 16 07:03:19 rx  [<ffffffff81797877>] ? __schedule+0x3a7/0x7c0
>> Nov 16 07:03:19 rx  [<ffffffff8106c74e>] ? process_one_work+0x16e/0x430
>> Nov 16 07:03:19 rx  [<ffffffff8106ea3c>] ? worker_thread+0x11c/0x410
>> Nov 16 07:03:19 rx  [<ffffffff8106e920>] ? manage_workers+0x360/0x360
>> Nov 16 07:03:19 rx  [<ffffffff810738f6>] ? kthread+0xc6/0xd0
>> Nov 16 07:03:19 rx  [<ffffffff81003869>] ? xen_end_context_switch+0x19/0x20
>> Nov 16 07:03:19 rx  [<ffffffff81073830>] ?
>> kthread_freezable_should_stop+0x70/0x70
>> Nov 16 07:03:19 rx  [<ffffffff817a10bc>] ? ret_from_fork+0x7c/0xb0
>> Nov 16 07:03:19 rx  [<ffffffff81073830>] ?
>> kthread_freezable_should_stop+0x70/0x70
>> Nov 16 07:03:19 rx Code: 3b 45 84 0f 8d f1 01 00 00 48 8b 5d b0 4d 89
>> fd 48 03 5d a8 48 8b 4d b8 ff c0 89 01 0f ae e8 4c 8b 63 08 48 c7 43
>> 08 00 00 00 00 <49> 8b 84 24 d0 00 00 00 48 8b 75 c0 ff 45 cc 0f 18 08
>> 8b 45 cc
>> Nov 16 07:03:19 rx RIP  [<ffffffffa0031d51>]
>> e1000_clean_rx_irq+0x101/0x490 [e1000e]
>> Nov 16 07:03:19 rx  RSP <ffff88008ea03d98>
>> Nov 16 07:03:19 rx CR2: 00000000000000d0
>> Nov 16 07:03:19 rx ---[ end trace f896af6c9f44182e ]---
>> Nov 16 07:03:19 rx Kernel panic - not syncing: Fatal exception in interrupt
>> --------
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
  2013-12-03  3:51   ` Ethan Zhao
@ 2013-12-03 13:33     ` Nick Pegg
  2013-12-09  6:41       ` Ethan Zhao
  0 siblings, 1 reply; 9+ messages in thread
From: Nick Pegg @ 2013-12-03 13:33 UTC (permalink / raw)
  To: Ethan Zhao
  Cc: Bjorn Helgaas, linux-kernel, e1000-devel, Jin Feng, konrad.wilk

On Mon, Dec 2, 2013 at 10:51 PM, Ethan Zhao <ethan.kernel@gmail.com> wrote:
> Bjorn,
>    Seems not the same bug as  http://sourceforge.net/p/e1000/bugs/367/
> ,  Nick is not running his kernel on bare metal, per the error log,
> he runs his kernel as HVM DomU guest or Dom0 on XEN ?  so just a check
> of NULL will not fix that.
>

Sorry, I neglected to say in my original email that the kernel is
running as a Xen Dom0. Per Todd's request, I've opened a bug report on
sourceforge and will follow up with this issue there:
https://sourceforge.net/p/e1000/bugs/385/

Thanks,
Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
  2013-12-03 13:33     ` Nick Pegg
@ 2013-12-09  6:41       ` Ethan Zhao
  0 siblings, 0 replies; 9+ messages in thread
From: Ethan Zhao @ 2013-12-09  6:41 UTC (permalink / raw)
  To: Nick Pegg; +Cc: Bjorn Helgaas, linux-kernel, e1000-devel, Jin Feng, konrad.wilk

Nick,
    You could try 7.3.21-k8-NAPI in tree or the out-of-tree version as
Bjorn mentioned.
To read and debug an old version driver is not a interesting thing for
somebody to do.

Thanks,
Ethan

On Tue, Dec 3, 2013 at 9:33 PM, Nick Pegg <nick@nickpegg.com> wrote:
> On Mon, Dec 2, 2013 at 10:51 PM, Ethan Zhao <ethan.kernel@gmail.com> wrote:
>> Bjorn,
>>    Seems not the same bug as  http://sourceforge.net/p/e1000/bugs/367/
>> ,  Nick is not running his kernel on bare metal, per the error log,
>> he runs his kernel as HVM DomU guest or Dom0 on XEN ?  so just a check
>> of NULL will not fix that.
>>
>
> Sorry, I neglected to say in my original email that the kernel is
> running as a Xen Dom0. Per Todd's request, I've opened a bug report on
> sourceforge and will follow up with this issue there:
> https://sourceforge.net/p/e1000/bugs/385/
>
> Thanks,
> Nick

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
  2013-11-14 10:18 Milan Kocian
@ 2013-11-14 16:40 ` Jeff Kirsher
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff Kirsher @ 2013-11-14 16:40 UTC (permalink / raw)
  To: Milan Kocian, David Ertman, Bruce Allan; +Cc: netdev, jesse.brandeburg

[-- Attachment #1: Type: text/plain, Size: 6966 bytes --]

On Thu, 2013-11-14 at 11:18 +0100, Milan Kocian wrote:
> hi,
> 
> its probably for intel's guys ... 

Adding Bruce Allan and David Ertman (e1000e maintainers)...

> 
> I see 'Detected Hardware Unit Hang' in log repeatedly until
> transmit hangs forever (until reboot:-)). I tried old fixeep
> script which I got from Jesse Brandeburg some years ago but
> no success. Any ideas ? Many thanks for any answer.
> 
> kernel 3.10.18
> 
> Outputs & logs:
> 
> ethtool -i eth0
> driver: e1000e
> version: 2.3.2-k
> firmware-version: 0.13-4
> bus-info: 0000:00:19.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: no
> 
> 
> 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04)
> 	Subsystem: Intel Corporation Device 2036
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0
> 	Interrupt: pin A routed to IRQ 42
> 	Region 0: Memory at f7d00000 (32-bit, non-prefetchable) [size=128K]
> 	Region 1: Memory at f7d35000 (32-bit, non-prefetchable) [size=4K]
> 	Region 2: I/O ports at f080 [size=32]
> 	Capabilities: [c8] Power Management version 2
> 		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> 		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
> 	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> 		Address: 00000000fee0100c  Data: 4172
> 	Capabilities: [e0] PCI Advanced Features
> 		AFCap: TP+ FLR+
> 		AFCtrl: FLR-
> 		AFStatus: TP-
> 	Kernel driver in use: e1000e
> 
> 
> 
> e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
>   TDH                  <a9>
>   TDT                  <7>
>   next_to_use          <7>
>   next_to_clean        <a6>
> buffer_info[next_to_clean]:
>   time_stamp           <4af1c>
>   next_to_watch        <a9>
>   jiffies              <4b12e>
>   next_to_watch.status <0>
> MAC Status             <80083>
> PHY Status             <796d>
> PHY 1000BASE-T Status  <3c00>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
>   TDH                  <a9>
>   TDT                  <7>
>   next_to_use          <7>
>   next_to_clean        <a6>
> buffer_info[next_to_clean]:
>   time_stamp           <4af1c>
>   next_to_watch        <a9>
>   jiffies              <4b386>
>   next_to_watch.status <0>
> MAC Status             <80083>
> PHY Status             <796d>
> PHY 1000BASE-T Status  <3c00>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
>   TDH                  <a9>
>   TDT                  <7>
>   next_to_use          <7>
>   next_to_clean        <a6>
> buffer_info[next_to_clean]:
>   time_stamp           <4af1c>
>   next_to_watch        <a9>
>   jiffies              <4b5de>
>   next_to_watch.status <0>
> MAC Status             <80083>
> PHY Status             <796d>
> PHY 1000BASE-T Status  <3c00>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
>   TDH                  <a9>
>   TDT                  <7>
>   next_to_use          <7>
>   next_to_clean        <a6>
> buffer_info[next_to_clean]:
>   time_stamp           <4af1c>
>   next_to_watch        <a9>
>   jiffies              <4b836>
>   next_to_watch.status <0>
> MAC Status             <80083>
> PHY Status             <796d>
> PHY 1000BASE-T Status  <3c00>
> PHY Extended Status    <3000>
> PCI Status             <10>
> ------------[ cut here ]------------
> WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1cc/0x1e0()
> NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
> Modules linked in: tun ipv6 xt_nat xt_conntrack xt_tcpudp xt_LOG xt_limit iptable_nat nf_nat_ipv4 nf_nat iptable_filter ip_tables x_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ehci_pci ehci_hcd e1000e coretemp xhci_hcd hwmon usbcore crc32_pclmul iTCO_wdt iTCO_vendor_support lpc_ich crc32c_intel mfd_core ptp pps_core usb_common i2c_i801 i2c_core microcode
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.18-core2-router18 #3
> Hardware name:                  /DQ77KB, BIOS KBQ7710H.86A.0049.2013.0130.1732 01/30/2013
>  c12ad3ea c102836e c135ece0 f540bf58 000000ff c123b38c c123b38c 00000000
>  f4d98000 000005dc 0004b964 c10283c3 00000009 f540bf40 c135ece0 f540bf58
>  c123b38c c135bdec 000000ff c135ece0 f4d98000 f8184740 00000000 00000100
> Call Trace:
>  [<c12ad3ea>] ? dump_stack+0xa/0x13
>  [<c102836e>] ? warn_slowpath_common+0x5e/0x80
>  [<c123b38c>] ? dev_watchdog+0x1cc/0x1e0
>  [<c123b38c>] ? dev_watchdog+0x1cc/0x1e0
>  [<c10283c3>] ? warn_slowpath_fmt+0x33/0x40
>  [<c123b38c>] ? dev_watchdog+0x1cc/0x1e0
>  [<c123b1c0>] ? dev_graft_qdisc+0x60/0x60
>  [<c1033636>] ? call_timer_fn.isra.35+0x16/0x70
>  [<c1045168>] ? hrtimer_run_pending+0x28/0xe0
>  [<c10337a9>] ? run_timer_softirq+0x119/0x1a0
>  [<c102e9e3>] ? __do_softirq+0xa3/0x160
>  [<c102e940>] ? remote_softirq_receive+0x40/0x40
>  <IRQ>  [<c102eb85>] ? irq_exit+0x45/0x50
>  [<c101a912>] ? smp_apic_timer_interrupt+0x52/0x90
>  [<c12b02a5>] ? apic_timer_interrupt+0x2d/0x34
>  [<c1008002>] ? default_idle+0x2/0x10
>  [<c10085a6>] ? arch_cpu_idle+0x6/0x10
>  [<c1052767>] ? cpu_startup_entry+0xa7/0xf0
>  [<c13be926>] ? start_kernel+0x285/0x28a
>  [<c13be4ed>] ? repair_env_string+0x4d/0x4d
> ---[ end trace a052b68b72311803 ]---
> e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
>   TDH                  <92>
>   TDT                  <b6>
>   next_to_use          <b6>
>   next_to_clean        <92>
> buffer_info[next_to_clean]:
>   time_stamp           <526ff>
>   next_to_watch        <92>
>   jiffies              <529e2>
>   next_to_watch.status <0>
> MAC Status             <80083>
> PHY Status             <796d>
> PHY 1000BASE-T Status  <3c00>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
>   TDH                  <92>
>   TDT                  <b6>
>   next_to_use          <b6>
>   next_to_clean        <92>
> buffer_info[next_to_clean]:
>   time_stamp           <526ff>
>   next_to_watch        <92>
>   jiffies              <52c3a>
>   next_to_watch.status <0>
> MAC Status             <80083>
> PHY Status             <796d>
> PHY 1000BASE-T Status  <3c00>
> PHY Extended Status    <3000>
> PCI Status             <10>
> e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
> 


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
@ 2013-11-14 10:18 Milan Kocian
  2013-11-14 16:40 ` Jeff Kirsher
  0 siblings, 1 reply; 9+ messages in thread
From: Milan Kocian @ 2013-11-14 10:18 UTC (permalink / raw)
  To: netdev; +Cc: jeffrey.t.kirsher, jesse.brandeburg

hi,

its probably for intel's guys ... 

I see 'Detected Hardware Unit Hang' in log repeatedly until
transmit hangs forever (until reboot:-)). I tried old fixeep
script which I got from Jesse Brandeburg some years ago but
no success. Any ideas ? Many thanks for any answer.

kernel 3.10.18

Outputs & logs:

ethtool -i eth0
driver: e1000e
version: 2.3.2-k
firmware-version: 0.13-4
bus-info: 0000:00:19.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no


00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04)
	Subsystem: Intel Corporation Device 2036
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 42
	Region 0: Memory at f7d00000 (32-bit, non-prefetchable) [size=128K]
	Region 1: Memory at f7d35000 (32-bit, non-prefetchable) [size=4K]
	Region 2: I/O ports at f080 [size=32]
	Capabilities: [c8] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee0100c  Data: 4172
	Capabilities: [e0] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: e1000e



e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
  TDH                  <a9>
  TDT                  <7>
  next_to_use          <7>
  next_to_clean        <a6>
buffer_info[next_to_clean]:
  time_stamp           <4af1c>
  next_to_watch        <a9>
  jiffies              <4b12e>
  next_to_watch.status <0>
MAC Status             <80083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
  TDH                  <a9>
  TDT                  <7>
  next_to_use          <7>
  next_to_clean        <a6>
buffer_info[next_to_clean]:
  time_stamp           <4af1c>
  next_to_watch        <a9>
  jiffies              <4b386>
  next_to_watch.status <0>
MAC Status             <80083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
  TDH                  <a9>
  TDT                  <7>
  next_to_use          <7>
  next_to_clean        <a6>
buffer_info[next_to_clean]:
  time_stamp           <4af1c>
  next_to_watch        <a9>
  jiffies              <4b5de>
  next_to_watch.status <0>
MAC Status             <80083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
  TDH                  <a9>
  TDT                  <7>
  next_to_use          <7>
  next_to_clean        <a6>
buffer_info[next_to_clean]:
  time_stamp           <4af1c>
  next_to_watch        <a9>
  jiffies              <4b836>
  next_to_watch.status <0>
MAC Status             <80083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x1cc/0x1e0()
NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
Modules linked in: tun ipv6 xt_nat xt_conntrack xt_tcpudp xt_LOG xt_limit iptable_nat nf_nat_ipv4 nf_nat iptable_filter ip_tables x_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ehci_pci ehci_hcd e1000e coretemp xhci_hcd hwmon usbcore crc32_pclmul iTCO_wdt iTCO_vendor_support lpc_ich crc32c_intel mfd_core ptp pps_core usb_common i2c_i801 i2c_core microcode
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.18-core2-router18 #3
Hardware name:                  /DQ77KB, BIOS KBQ7710H.86A.0049.2013.0130.1732 01/30/2013
 c12ad3ea c102836e c135ece0 f540bf58 000000ff c123b38c c123b38c 00000000
 f4d98000 000005dc 0004b964 c10283c3 00000009 f540bf40 c135ece0 f540bf58
 c123b38c c135bdec 000000ff c135ece0 f4d98000 f8184740 00000000 00000100
Call Trace:
 [<c12ad3ea>] ? dump_stack+0xa/0x13
 [<c102836e>] ? warn_slowpath_common+0x5e/0x80
 [<c123b38c>] ? dev_watchdog+0x1cc/0x1e0
 [<c123b38c>] ? dev_watchdog+0x1cc/0x1e0
 [<c10283c3>] ? warn_slowpath_fmt+0x33/0x40
 [<c123b38c>] ? dev_watchdog+0x1cc/0x1e0
 [<c123b1c0>] ? dev_graft_qdisc+0x60/0x60
 [<c1033636>] ? call_timer_fn.isra.35+0x16/0x70
 [<c1045168>] ? hrtimer_run_pending+0x28/0xe0
 [<c10337a9>] ? run_timer_softirq+0x119/0x1a0
 [<c102e9e3>] ? __do_softirq+0xa3/0x160
 [<c102e940>] ? remote_softirq_receive+0x40/0x40
 <IRQ>  [<c102eb85>] ? irq_exit+0x45/0x50
 [<c101a912>] ? smp_apic_timer_interrupt+0x52/0x90
 [<c12b02a5>] ? apic_timer_interrupt+0x2d/0x34
 [<c1008002>] ? default_idle+0x2/0x10
 [<c10085a6>] ? arch_cpu_idle+0x6/0x10
 [<c1052767>] ? cpu_startup_entry+0xa7/0xf0
 [<c13be926>] ? start_kernel+0x285/0x28a
 [<c13be4ed>] ? repair_env_string+0x4d/0x4d
---[ end trace a052b68b72311803 ]---
e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
  TDH                  <92>
  TDT                  <b6>
  next_to_use          <b6>
  next_to_clean        <92>
buffer_info[next_to_clean]:
  time_stamp           <526ff>
  next_to_watch        <92>
  jiffies              <529e2>
  next_to_watch.status <0>
MAC Status             <80083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
  TDH                  <92>
  TDT                  <b6>
  next_to_use          <b6>
  next_to_clean        <92>
buffer_info[next_to_clean]:
  time_stamp           <526ff>
  next_to_watch        <92>
  jiffies              <52c3a>
  next_to_watch.status <0>
MAC Status             <80083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx


-- 
Milan Kocian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-12-09  6:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-20 18:44 NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out Nick Pegg
2013-11-20 21:22 ` Bjorn Helgaas
2013-12-02 22:57   ` Nick Pegg
2013-12-02 23:34     ` [E1000-devel] " Fujinaka, Todd
2013-12-03  3:51   ` Ethan Zhao
2013-12-03 13:33     ` Nick Pegg
2013-12-09  6:41       ` Ethan Zhao
  -- strict thread matches above, loose matches on Subject: below --
2013-11-14 10:18 Milan Kocian
2013-11-14 16:40 ` Jeff Kirsher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.