All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Brown, Aaron F" <aaron.f.brown@intel.com>
To: Thomas Jarosch <thomas.jarosch@intra2net.com>
Cc: "Kirsher, Jeffrey T" <jeffrey.t.kirsher@intel.com>,
	'Linux Netdev List' <netdev@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	e1000-devel <e1000-devel@lists.sourceforge.net>
Subject: RE: [bisected regression] e1000e: "Detected Hardware Unit Hang"
Date: Sat, 21 Feb 2015 01:59:35 +0000	[thread overview]
Message-ID: <309B89C4C689E141A5FF6A0C5FB2118B78D72A14@ORSMSX101.amr.corp.intel.com> (raw)
In-Reply-To: <2370189.uakqRR7OLn@storm>



> -----Original Message-----
> From: Thomas Jarosch [mailto:thomas.jarosch@intra2net.com]
> Sent: Friday, February 13, 2015 8:15 AM
> To: Brown, Aaron F
> Cc: Kirsher, Jeffrey T; 'Linux Netdev List'; Eric Dumazet; e1000-devel
> Subject: Re: [bisected regression] e1000e: "Detected Hardware Unit Hang"
> 
> Hi Aaron,
> 
> On Thursday, 12. February 2015 23:28:27 Brown, Aaron F wrote:
> > I do not have any real info.  I had been asked to try and reproduce some
> > unit hangs (maybe for this) recently and did not succeed in producing
> > them on the parts I have.  Reading through the thread I see this is
> > showing up in a NAT environment.  The port that is getting the unit hang
> > in the NAT system?
> 
> yes, the e1000e NIC is serving the NATed Windows client.
> 
> The setup was outlined here:
> 
>     http://marc.info/?l=linux-netdev&m=142133691713824&w=2
> 
> > I will make some attempts at replicating this with the port in a NAT and
> > or forwarding role.  Has a bug been opened for this?  Or has information
> > for this specific unit hang been entered into one of the other unit hang
> > bugs opened against e1000e?
> 
> I didn't do anything(tm). This report sounds like the same issue:
> 
>     http://ehc.ac/p/e1000/bugs/378/
> 
> Oliver Wagner wrote the problem started to appear
> after updating from kernel 3.5 to 3.8.0.35 (new frag size code).
> 
> I just noticed now he wrote he has two identical boxes:
> 
> ---------------------------------------------------
> - Box with symptoms: Router/Firewall, packet forwarding
>   between different VLANs on eth0 and eth1
> - Box without symptoms: Fileserver, eth0/eth1 bonded
>   (VLANs used, but no forwarding)
> ---------------------------------------------------
> 
> So it looks like it's related to forwarding somehow,
> I've made the same experience IIRC.

Thanks, that (and the multiple bug write-ups on sourceforge) gave me more than enough to go on.  I was able to replicate it on a handful of systems in my lab.  On effected systems setting up a NAT and stressing the interfaces with even moderate traffic levels triggers it pretty quickly.  It appears that the NAT part is unnecessary, just setting the systems up as a software router and running some traffic across it also triggers it giving the same apparent behavior (tx hang, watchdog timeout trace, port reset.)

And with an internal reproduction of the issue I have created an internal bug report, described my set of reproductions, referenced the similar external ones and assigned it to our current e1000e developer.

Thanks again,
Aaron

> 
> Cheers,
> Thomas


  reply	other threads:[~2015-02-21  1:59 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-14 15:32 [bisected regression] e1000e: "Detected Hardware Unit Hang" Thomas Jarosch
2015-01-14 17:20 ` Eric Dumazet
2015-01-15 10:11   ` Thomas Jarosch
2015-01-15 14:43     ` Eric Dumazet
2015-01-15 14:58       ` Thomas Jarosch
2015-01-15 15:25         ` Eric Dumazet
2015-01-15 15:48           ` Thomas Jarosch
2015-01-15 16:00             ` Eric Dumazet
2015-01-15 17:04               ` Thomas Jarosch
2015-01-15 17:20                 ` Eric Dumazet
2015-01-15 17:37                   ` Thomas Jarosch
2015-01-15 18:24                     ` Re: Re: Re: " Eric Dumazet
2015-01-19 16:49           ` Thomas Jarosch
2015-01-15 14:59       ` Jeff Kirsher
2015-02-11 11:23         ` Thomas Jarosch
2015-02-11 11:34           ` Jeff Kirsher
2015-02-12 23:28             ` Brown, Aaron F
2015-02-13 16:14               ` Thomas Jarosch
2015-02-21  1:59                 ` Brown, Aaron F [this message]
2015-03-23 13:58                   ` Thomas Jarosch
2015-03-23 22:37                     ` Brown, Aaron F
2015-05-27 16:00                       ` Thomas Jarosch
2015-05-30  1:18                         ` Brown, Aaron F
2015-07-29  8:51                           ` Thomas Jarosch
2019-05-02 12:58                             ` Juliana Rodrigueiro
2015-02-12  1:18           ` nick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=309B89C4C689E141A5FF6A0C5FB2118B78D72A14@ORSMSX101.amr.corp.intel.com \
    --to=aaron.f.brown@intel.com \
    --cc=e1000-devel@lists.sourceforge.net \
    --cc=edumazet@google.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=thomas.jarosch@intra2net.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.