All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Tantilov, Emil S" <emil.s.tantilov@intel.com>
To: Nix <nix@esperi.org.uk>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: "e1000-devel@lists.sourceforge.net"  <e1000-devel@lists.sourceforge.net>
Subject: RE: [E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by reboot)
Date: Mon, 1 Nov 2010 12:51:49 -0600	[thread overview]
Message-ID: <EA929A9653AAE14F841771FB1DE5A136602815A3C7@rrsmsx501.amr.corp.intel.com> (raw)
In-Reply-To: <87ocaaszx1.fsf@spindle.srvr.nix>

>-----Original Message-----
>From: Nix [mailto:nix@esperi.org.uk]
>Sent: Sunday, October 31, 2010 4:31 PM
>To: linux-kernel@vger.kernel.org
>Cc: e1000-devel@lists.sourceforge.net
>Subject: [E1000-devel] 2.6.36 abrupt total e1000e carrier loss (cured by
>reboot)
>
>It's the weekend, the time when busy servers get upgraded without
>annoying the users. I was just congratulating myself on an upgrade to
>2.6.36 with only a few problems (the NFS -ESTALE bug I have yet to
>localize, and a watchdog bug causing constant reboots which may well be
>the fault of the daemon), despite my pushing my luck by doing it on
>Halloween.
>
>But then, an hour or so after reboot, the server dropped off the net
>without warning, while it was idle of anything other than a trickle of
>NFS and forwarded web traffic. 'ip link' on the machine itself revealed
>that the interface on the e1000e dedicated to the gigabit subnet was in
>NO-CARRIER state (the other interface, running a 100Mb/s subnet, was
>fine). It was plainly this machine at fault: other machines on the
>gigabit subnet had carrier. Pulling the interface down and up again
>didn't help: nor did pulling the cable and reinserting it. Only a reboot
>cleared it.
>
>The netdev watchdog kicked in, but it wasn't very helpful, telling
>me only what I already knew. No other kernel messages were logged
>at the time the adapter fell off the net, or for minutes on either
>side.

Could you provide the output of lspci -vvv?

>
>Oct 31 22:50:44 spindle warning: [ 9691.647842] ------------[ cut here ]---
>---------
>Oct 31 22:50:44 spindle warning: [ 9691.648086] WARNING: at
>net/sched/sch_generic.c:258 dev_watchdog+0x147/0x1db()
>Oct 31 22:50:44 spindle warning: [ 9691.648511] Hardware name: empty
>Oct 31 22:50:44 spindle info: [ 9691.648746] NETDEV WATCHDOG: fastnet
>(e1000e): transmit queue 0 timed out
>Oct 31 22:50:44 spindle warning: [ 9691.649024] Modules linked in:
>firewire_ohci firewire_core
>Oct 31 22:50:44 spindle warning: [ 9691.649399] Pid: 0, comm: kworker/0:0
>Not tainted 2.6.36-dirty #1
>Oct 31 22:50:44 spindle warning: [ 9691.649639] Call Trace:
>Oct 31 22:50:44 spindle warning: [ 9691.649865]  <IRQ>
>[<ffffffff81062d12>] warn_slowpath_common+0x85/0x9d
>Oct 31 22:50:44 spindle warning: [ 9691.650177]  [<ffffffff81062dcd>]
>warn_slowpath_fmt+0x46/0x48
>Oct 31 22:50:44 spindle warning: [ 9691.650429]  [<ffffffff813c1a01>]
>dev_watchdog+0x147/0x1db
>Oct 31 22:50:44 spindle warning: [ 9691.650671]  [<ffffffff8106ed5a>]
>run_timer_softirq+0x210/0x2d8
>Oct 31 22:50:44 spindle warning: [ 9691.650921]  [<ffffffff813c18ba>] ?
>dev_watchdog+0x0/0x1db
>Oct 31 22:50:44 spindle warning: [ 9691.651185]  [<ffffffff81083776>] ?
>ktime_get+0x65/0xbe
>Oct 31 22:50:44 spindle warning: [ 9691.651429]  [<ffffffff8106845a>]
>__do_softirq+0xe3/0x1a5
>Oct 31 22:50:44 spindle warning: [ 9691.651674]  [<ffffffff810878d5>] ?
>tick_program_event+0x2a/0x2c
>Oct 31 22:50:44 spindle warning: [ 9691.651924]  [<ffffffff8102e18c>]
>call_softirq+0x1c/0x28
>Oct 31 22:50:44 spindle warning: [ 9691.652167]  [<ffffffff8102f4cc>]
>do_softirq+0x38/0x6d
>Oct 31 22:50:44 spindle warning: [ 9691.652412]  [<ffffffff810682cc>]
>irq_exit+0x3b/0x7d
>Oct 31 22:50:44 spindle warning: [ 9691.652658]  [<ffffffff81043b3c>]
>smp_apic_timer_interrupt+0x8d/0x9b
>Oct 31 22:50:44 spindle warning: [ 9691.652909]  [<ffffffff8102dc53>]
>apic_timer_interrupt+0x13/0x20
>Oct 31 22:50:44 spindle warning: [ 9691.653146]  <EOI>
>[<ffffffff8125657d>] ? acpi_idle_enter_bm+0x237/0x26b
>Oct 31 22:50:44 spindle warning: [ 9691.653446]  [<ffffffff81256578>] ?
>acpi_idle_enter_bm+0x232/0x26b
>Oct 31 22:50:44 spindle warning: [ 9691.653686]  [<ffffffff8135704b>]
>cpuidle_idle_call+0xa7/0x110
>Oct 31 22:50:44 spindle warning: [ 9691.653927]  [<ffffffff8102c5ab>]
>cpu_idle+0x63/0xd5
>Oct 31 22:50:44 spindle warning: [ 9691.654172]  [<ffffffff818b4f69>]
>start_secondary+0x1ae/0x1b2
>Oct 31 22:50:44 spindle warning: [ 9691.654415] ---[ end trace
>d27ba9fb6e9bfa53 ]---
>Oct 31 22:50:44 spindle err: [ 9691.654672] e1000e 0000:02:00.0: fastnet:
>Reset adapter
>
>A register dump from the failed adapter:
>
>Offset	Values
>--------	-----
>000:	 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>010:	 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>020:	 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>030:	 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>040:	 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>050:	 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>060:	 06 88 00 00 06 88 00 00 00 00 00 00 00 00 00 00
>070:	 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

There is a known issue on some systems with ASPM enabled which may cause the device to lose link. If the output of lspci, (which I asked for above) shows ASPM as enabled for the Ethernet devices - make sure to disable it in the BIOS. 

Thanks,
Emil


  reply	other threads:[~2010-11-01 18:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-31 23:30 2.6.36 abrupt total e1000e carrier loss (cured by reboot) Nix
2010-11-01 18:51 ` Tantilov, Emil S [this message]
2010-11-01 23:08   ` [E1000-devel] " Nix
2010-11-04  2:26     ` "Brandeburg, Jesse"
2010-11-04 21:35       ` Nix
2010-11-08  8:01       ` Nix
2010-11-08 18:11         ` Tantilov, Emil S
2010-11-08 20:21           ` Nix
2010-11-14 17:10             ` Nix

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=EA929A9653AAE14F841771FB1DE5A136602815A3C7@rrsmsx501.amr.corp.intel.com \
    --to=emil.s.tantilov@intel.com \
    --cc=e1000-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nix@esperi.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.