linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Update on e1000 troubles (over-heating!)
@ 2002-10-06  3:38 Ben Greear
  2002-10-06  3:47 ` Andre Hedrick
  0 siblings, 1 reply; 18+ messages in thread
From: Ben Greear @ 2002-10-06  3:38 UTC (permalink / raw)
  To: linux-kernel, 'netdev@oss.sgi.com'

I believe I have figured out why the e1000 crashed my machine
after .5 - 1 hours:  The NIC was over-heating.  I measured one of
the NICs after the machine crashed with an external (cheap) temp
probe.  It registered right at 50 degrees C, and this was about 15-30
seconds after it crashed.

The dual e1000 NIC I have seems to run much cooler, and has been
running at 430Mbps bi-directional on both ports for about 6 hours now
with no obvious problems.

So, I'm going to try to purchase some heat sinks and glue them onto
the e1000 server nics, to see if that fixes the problem.

Hope this proves useful to anyone experiencing similar strange
crashes!

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-06  3:38 Update on e1000 troubles (over-heating!) Ben Greear
@ 2002-10-06  3:47 ` Andre Hedrick
  2002-10-06 22:38   ` jamal
  0 siblings, 1 reply; 18+ messages in thread
From: Andre Hedrick @ 2002-10-06  3:47 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-kernel, 'netdev@oss.sgi.com'


I have a pair of Compaq e1000's which have never overheated, and I use
them for heavy duty iSCSI testing and designing of drivers.  These are
massive 66/64 cards but still nothing like what you are reporting.

I will look some more at the issue soon.

Cheers,

Andre Hedrick
iSCSI Software Solutions Provider
http://www.PyXTechnologies.com/


On Sat, 5 Oct 2002, Ben Greear wrote:

> I believe I have figured out why the e1000 crashed my machine
> after .5 - 1 hours:  The NIC was over-heating.  I measured one of
> the NICs after the machine crashed with an external (cheap) temp
> probe.  It registered right at 50 degrees C, and this was about 15-30
> seconds after it crashed.
> 
> The dual e1000 NIC I have seems to run much cooler, and has been
> running at 430Mbps bi-directional on both ports for about 6 hours now
> with no obvious problems.
> 
> So, I'm going to try to purchase some heat sinks and glue them onto
> the e1000 server nics, to see if that fixes the problem.
> 
> Hope this proves useful to anyone experiencing similar strange
> crashes!
> 
> Thanks,
> Ben
> 
> -- 
> Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
> President of Candela Technologies Inc      http://www.candelatech.com
> ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-06  3:47 ` Andre Hedrick
@ 2002-10-06 22:38   ` jamal
  2002-10-07  0:14     ` Andre Hedrick
  2002-10-07  3:46     ` Ben Greear
  0 siblings, 2 replies; 18+ messages in thread
From: jamal @ 2002-10-06 22:38 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: Ben Greear, linux-kernel, 'netdev@oss.sgi.com'



On Sat, 5 Oct 2002, Andre Hedrick wrote:

>
> I have a pair of Compaq e1000's which have never overheated, and I use
> them for heavy duty iSCSI testing and designing of drivers.  These are
> massive 66/64 cards but still nothing like what you are reporting.
>
> I will look some more at the issue soon.
>

It seems like the prerequisite to reproduce it is you beat the NIC heavily
with a lot of packets/sec and then run it at that sustained rate for at
least 30 minutes. isci would tend to use MTU sized packets which will
not be that effective.

cheers,
jamal





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-06 22:38   ` jamal
@ 2002-10-07  0:14     ` Andre Hedrick
  2002-10-07 11:56       ` jamal
  2002-10-07  3:46     ` Ben Greear
  1 sibling, 1 reply; 18+ messages in thread
From: Andre Hedrick @ 2002-10-07  0:14 UTC (permalink / raw)
  To: jamal; +Cc: Ben Greear, linux-kernel, 'netdev@oss.sgi.com'


However doing a data integrity test with a pattern buffer
write-verify-read on multi-lun, multi-session, and multiple connections
per session, while issuing load-balancing commands (ie thread tag) over
each session to roast the bandwidth of the line should be enough.

Now toss in injected errors to randomly fail data pdu's and calling a
sync-and-steering layer to scan the header and or data digests to execute
a within connection recovery, regardless if the reason, should be enough
to warm up the beast.

If that is not enough, I can toss in multi-initiators all with the
features above or invoke the interoperablity modes to add the cisco and
ibm initiator (both limited to error recovery level zero, while pyx's is
capable of error recovery level one and part of two).

Please let me know if I need to throttle it harder.

Cheers,

On Sun, 6 Oct 2002, jamal wrote:

> 
> 
> On Sat, 5 Oct 2002, Andre Hedrick wrote:
> 
> >
> > I have a pair of Compaq e1000's which have never overheated, and I use
> > them for heavy duty iSCSI testing and designing of drivers.  These are
> > massive 66/64 cards but still nothing like what you are reporting.
> >
> > I will look some more at the issue soon.
> >
> 
> It seems like the prerequisite to reproduce it is you beat the NIC heavily
> with a lot of packets/sec and then run it at that sustained rate for at
> least 30 minutes. isci would tend to use MTU sized packets which will
> not be that effective.
> 
> cheers,
> jamal
> 
> 
> 
> 

Andre Hedrick
iSCSI Software Solutions Provider
http://www.PyXTechnologies.com/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-06 22:38   ` jamal
  2002-10-07  0:14     ` Andre Hedrick
@ 2002-10-07  3:46     ` Ben Greear
  2002-10-07  5:26       ` David S. Miller
  2002-10-07 11:53       ` jamal
  1 sibling, 2 replies; 18+ messages in thread
From: Ben Greear @ 2002-10-07  3:46 UTC (permalink / raw)
  To: jamal; +Cc: Andre Hedrick, linux-kernel, 'netdev@oss.sgi.com'

jamal wrote:

> It seems like the prerequisite to reproduce it is you beat the NIC heavily
> with a lot of packets/sec and then run it at that sustained rate for at
> least 30 minutes. isci would tend to use MTU sized packets which will
> not be that effective.

I can reproduce my crash using mtu sized pkts running only 50Mbps send + receive
on 2 nics.  It took over-night to do it though.  Running as hard as I can with
MTU packets will crash it as well, and much quicker.

Interestingly enough, the tg3 NIC (netgear 302t), registered 57 deg C between
the fins of it's heat sink in the 32-bit slots.  Makes me wonder if my PCI bus
is running too hot :P

Dave says I'm wierd and no one else sees these bizarre problems, btw :)

More trouble-shooting to follow this next week.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-07  3:46     ` Ben Greear
@ 2002-10-07  5:26       ` David S. Miller
  2002-10-07 11:53       ` jamal
  1 sibling, 0 replies; 18+ messages in thread
From: David S. Miller @ 2002-10-07  5:26 UTC (permalink / raw)
  To: greearb; +Cc: hadi, andre, linux-kernel, netdev

   From: Ben Greear <greearb@candelatech.com>
   Date: Sun, 06 Oct 2002 20:46:42 -0700
   
   Dave says I'm wierd and no one else sees these bizarre problems, btw :)
   
The only case where I'm really concerned about the health
of your PCI controller is the most recent case you've
reported to me where pci_find_capability(pdev, PCI_CAP_ID_PM)
fails.  That is just completely bizarre.

I hope your boards aren't being permanently harmed by your box which
is overheating.:(

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-07  3:46     ` Ben Greear
  2002-10-07  5:26       ` David S. Miller
@ 2002-10-07 11:53       ` jamal
  2002-10-07 11:58         ` David S. Miller
  2002-10-07 16:40         ` Ben Greear
  1 sibling, 2 replies; 18+ messages in thread
From: jamal @ 2002-10-07 11:53 UTC (permalink / raw)
  To: Ben Greear; +Cc: Andre Hedrick, linux-kernel, 'netdev@oss.sgi.com'



On Sun, 6 Oct 2002, Ben Greear wrote:

> I can reproduce my crash using mtu sized pkts running only 50Mbps
> send + receive on 2 nics.  It took over-night to do it though.  Running
> as hard as I can with MTU packets will crash it as well, and much
>quicker.
>

So is there a correlation with packet count then?


> Interestingly enough, the tg3 NIC (netgear 302t), registered 57 deg C between
> the fins of it's heat sink in the 32-bit slots.  Makes me wonder if my PCI bus
> is running too hot :P

Does the problem happen with the tg3?

cheers,
jamal



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-07  0:14     ` Andre Hedrick
@ 2002-10-07 11:56       ` jamal
  0 siblings, 0 replies; 18+ messages in thread
From: jamal @ 2002-10-07 11:56 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: Ben Greear, linux-kernel, 'netdev@oss.sgi.com'



It does seem like you need a lot of packets over a period of time
to recreate it. So if what you are trying to do can achieve that,
you should reproduce it. How many connections and sessions can you
support? BTW, does iscsi call for a zero-copy receive?

cheers,
jamal




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-07 11:53       ` jamal
@ 2002-10-07 11:58         ` David S. Miller
  2002-10-07 16:40         ` Ben Greear
  1 sibling, 0 replies; 18+ messages in thread
From: David S. Miller @ 2002-10-07 11:58 UTC (permalink / raw)
  To: hadi; +Cc: greearb, andre, linux-kernel, netdev

   From: jamal <hadi@cyberus.ca>
   Date: Mon, 7 Oct 2002 07:53:26 -0400 (EDT)
   
   Does the problem happen with the tg3?

He gets hangs in one box, inoperable PCI config space accesses for the
cards in another box.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-07 11:53       ` jamal
  2002-10-07 11:58         ` David S. Miller
@ 2002-10-07 16:40         ` Ben Greear
  1 sibling, 0 replies; 18+ messages in thread
From: Ben Greear @ 2002-10-07 16:40 UTC (permalink / raw)
  To: jamal; +Cc: Andre Hedrick, linux-kernel, 'netdev@oss.sgi.com'

jamal wrote:
> 
> On Sun, 6 Oct 2002, Ben Greear wrote:
> 
> 
>>I can reproduce my crash using mtu sized pkts running only 50Mbps
>>send + receive on 2 nics.  It took over-night to do it though.  Running
>>as hard as I can with MTU packets will crash it as well, and much
>>quicker.
>>
> 
> 
> So is there a correlation with packet count then?

No, running at slower speeds (50Mbps), the packet count was well over
4 billion (ie it successfully wrapped 32-bits).  At higher speeds, it
crashes before the 32-bit wrap, generally.  It also does not coorelate
to bytes-sent/received, or anything else that I could think of to look at.

> 
> 
> 
>>Interestingly enough, the tg3 NIC (netgear 302t), registered 57 deg C between
>>the fins of it's heat sink in the 32-bit slots.  Makes me wonder if my PCI bus
>>is running too hot :P
> 
> 
> Does the problem happen with the tg3?

As Dave mentioned, tg3 locks up almost immediately (like within 30 seconds),
and in the meantime, it's spitting out errors that are 'impossible'.  The
messages I sent a day or two ago.

I may have cooked my cards, or something like that, because one of
the tg3's do not work in my other machine now.  Still trouble-shooting that one.

Ben


> 
> cheers,
> jamal
> 
> 


-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-15  5:42 ` Dave Hansen
@ 2002-10-15  7:07   ` Ben Greear
  0 siblings, 0 replies; 18+ messages in thread
From: Ben Greear @ 2002-10-15  7:07 UTC (permalink / raw)
  To: Dave Hansen; +Cc: Feldman, Scott, linux-kernel, 'netdev@oss.sgi.com'

Dave Hansen wrote:

> 
> I get some strange e1000 failures too.  It usually involves the watchdog 
> kicking them back into order, but sometimes they'll stay offline for a 
> while.  Heat would explain it, though, because it only happens when I'm 
> actually using the cards for a benchmark.  I figured that it was either 
> my cables, or a shoddy switch.
> 
> The new dual-port e1000 that I have doesn't seem to have this problem, 
> even though I'm running 4 times more traffic than the singles that I had.

That was exactly the behaviour I noticed.  I believe it's because when you
run two side-by-side, they cook each other (I'm assuming you didn't run
2 2-ports side-by-side)

Try strapping a fan on them somehow and I bet all your troubles go
away (and maybe your .ibm email will shame Intel into putting heat-sinks
and/or small fans on their NICs... ;)

(I ran two Netgear 302t NICs (tigon-3) side-by-side for 4 days at max speed, and they
  didn't drop a single packet, even though their heat-sinks were too hot to
  touch!)

Ben

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-15  2:20 Feldman, Scott
  2002-10-15  2:37 ` Andi Kleen
  2002-10-15  5:42 ` Dave Hansen
@ 2002-10-15  7:01 ` Ben Greear
  2 siblings, 0 replies; 18+ messages in thread
From: Ben Greear @ 2002-10-15  7:01 UTC (permalink / raw)
  To: Feldman, Scott; +Cc: linux-kernel, 'netdev@oss.sgi.com'

Feldman, Scott wrote:
>>Here is the lspci information, both -x and -vv.  This is with 
>>two of the e1000 single-port NICS side-by-side.  I have also 
>>strapped a P-IV CPU fan on top of the two cards to blow some 
>>air over them....running tests now to see if that actually 
>>helps anything.  If it does, I'll be sure to send you a picture :)
> 
> 
> Ben, I checked the datasheet for the part shown in the lspci dump, and it
> shows an operating temperature of 0-55 degrees C.  You said you measured 50
> degrees C, so you're within the safe range.  Did the fans help?

The fan did help, and Andi is right, the chip was much hotter than what
my probe read (I was gently pushing it against the top of the chip, cause it
was too hot to really press my finger against it to get good contact :))

With the fan blowing on the chips, it has been perfect.  This implies to me
that if you are going to run the e1000, you need significant air-flow over
the chipset, and the generic 2U chassis that I have is definately inadequate,
partially because the MB is so big that the fans are too far away from the
PCI slots...  This is all doubly true if you are running two NICs side-by-side,
which is what I was doing.

I am also considering glueing heat-sinks onto the main chip, which may make it
work in more marginal environments.

Ben

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-15  2:20 Feldman, Scott
  2002-10-15  2:37 ` Andi Kleen
@ 2002-10-15  5:42 ` Dave Hansen
  2002-10-15  7:07   ` Ben Greear
  2002-10-15  7:01 ` Ben Greear
  2 siblings, 1 reply; 18+ messages in thread
From: Dave Hansen @ 2002-10-15  5:42 UTC (permalink / raw)
  To: Feldman, Scott
  Cc: 'Ben Greear', linux-kernel, 'netdev@oss.sgi.com'

Feldman, Scott wrote:
>>Here is the lspci information, both -x and -vv.  This is with 
>>two of the e1000 single-port NICS side-by-side.  I have also 
>>strapped a P-IV CPU fan on top of the two cards to blow some 
>>air over them....running tests now to see if that actually 
>>helps anything.  If it does, I'll be sure to send you a picture :)
> 
> Ben, I checked the datasheet for the part shown in the lspci dump, and it
> shows an operating temperature of 0-55 degrees C.  You said you measured 50
> degrees C, so you're within the safe range.  Did the fans help?
> 
> Here's the datasheet:
> http://www.intel.com/network/connectivity/resources/doc_library/data_sheets/
> pro1000mt_sa.pdf

I get some strange e1000 failures too.  It usually involves the 
watchdog kicking them back into order, but sometimes they'll stay 
offline for a while.  Heat would explain it, though, because it only 
happens when I'm actually using the cards for a benchmark.  I figured 
that it was either my cables, or a shoddy switch.

The new dual-port e1000 that I have doesn't seem to have this problem, 
even though I'm running 4 times more traffic than the singles that I 
had.
-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-15  2:37 ` Andi Kleen
@ 2002-10-15  2:54   ` Jonathan Lundell
  0 siblings, 0 replies; 18+ messages in thread
From: Jonathan Lundell @ 2002-10-15  2:54 UTC (permalink / raw)
  To: Andi Kleen, Feldman, Scott
  Cc: 'Ben Greear', linux-kernel, 'netdev@oss.sgi.com'

At 4:37am +0200 10/15/02, Andi Kleen wrote:
>  > Ben, I checked the datasheet for the part shown in the lspci dump, and it
>>  shows an operating temperature of 0-55 degrees C.  You said you measured 50
>>  degrees C, so you're within the safe range.  Did the fans help?
>
>The thermometer he used likely showed a much lower temperature than what was
>actually on the die. 5-10 C more are not unlikely. It's hard to measure chip
>temperatures accurately without an on die thermal diode or special kit.
>So I would expect that when an external normal thermometer showed 50C
>it was already operating out of spec.

The datasheet's for the card, so the operating temperature is surely 
ambient, not die temperature. "Ambient measured how?" would be a 
reasonable question, though.
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-15  2:20 Feldman, Scott
@ 2002-10-15  2:37 ` Andi Kleen
  2002-10-15  2:54   ` Jonathan Lundell
  2002-10-15  5:42 ` Dave Hansen
  2002-10-15  7:01 ` Ben Greear
  2 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2002-10-15  2:37 UTC (permalink / raw)
  To: Feldman, Scott
  Cc: 'Ben Greear', linux-kernel, 'netdev@oss.sgi.com'

On Mon, Oct 14, 2002 at 07:20:04PM -0700, Feldman, Scott wrote:
> > Here is the lspci information, both -x and -vv.  This is with 
> > two of the e1000 single-port NICS side-by-side.  I have also 
> > strapped a P-IV CPU fan on top of the two cards to blow some 
> > air over them....running tests now to see if that actually 
> > helps anything.  If it does, I'll be sure to send you a picture :)
> 
> Ben, I checked the datasheet for the part shown in the lspci dump, and it
> shows an operating temperature of 0-55 degrees C.  You said you measured 50
> degrees C, so you're within the safe range.  Did the fans help?

The thermometer he used likely showed a much lower temperature than what was 
actually on the die. 5-10 C more are not unlikely. It's hard to measure chip
temperatures accurately without an on die thermal diode or special kit.
So I would expect that when an external normal thermometer showed 50C 
it was already operating out of spec.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Update on e1000 troubles (over-heating!)
@ 2002-10-15  2:20 Feldman, Scott
  2002-10-15  2:37 ` Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Feldman, Scott @ 2002-10-15  2:20 UTC (permalink / raw)
  To: 'Ben Greear'; +Cc: linux-kernel, 'netdev@oss.sgi.com'

> Here is the lspci information, both -x and -vv.  This is with 
> two of the e1000 single-port NICS side-by-side.  I have also 
> strapped a P-IV CPU fan on top of the two cards to blow some 
> air over them....running tests now to see if that actually 
> helps anything.  If it does, I'll be sure to send you a picture :)

Ben, I checked the datasheet for the part shown in the lspci dump, and it
shows an operating temperature of 0-55 degrees C.  You said you measured 50
degrees C, so you're within the safe range.  Did the fans help?

Here's the datasheet:
http://www.intel.com/network/connectivity/resources/doc_library/data_sheets/
pro1000mt_sa.pdf

-scott

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Update on e1000 troubles (over-heating!)
  2002-10-06  7:33 Feldman, Scott
@ 2002-10-08 18:44 ` Ben Greear
  0 siblings, 0 replies; 18+ messages in thread
From: Ben Greear @ 2002-10-08 18:44 UTC (permalink / raw)
  To: Feldman, Scott; +Cc: linux-kernel, 'netdev@oss.sgi.com'

[-- Attachment #1: Type: text/plain, Size: 932 bytes --]

Feldman, Scott wrote:
>>I believe I have figured out why the e1000 crashed my machine 
>>after .5 - 1 hours:  The NIC was over-heating.  I measured 
>>one of the NICs after the machine crashed with an external 
>>(cheap) temp probe.  It registered right at 50 degrees C, and 
>>this was about 15-30 seconds after it crashed.
> 
> 
> Ben, please send lspci -x on the hot nic.

Here is the lspci information, both -x and -vv.  This is with two of
the e1000 single-port NICS side-by-side.  I have also strapped a P-IV
CPU fan on top of the two cards to blow some air over them....running
tests now to see if that actually helps anything.  If it does, I'll
be sure to send you a picture :)

Thanks,
Ben

> 
> -scott
> 


-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear


[-- Attachment #2: lspci.txt --]
[-- Type: text/plain, Size: 10779 bytes --]

00:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System Controller (rev 11)
00: 22 10 0c 70 06 00 30 22 11 00 00 06 00 40 00 00
10: 08 00 00 f8 08 00 20 f6 91 10 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 00

00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] AGP Bridge
00: 22 10 0d 70 07 00 20 02 00 00 04 06 00 40 01 00
10: 00 00 00 00 00 00 00 00 00 01 01 44 f1 01 20 22
20: f0 ff 00 00 f0 ff 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 00 04 00

00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] ISA (rev 05)
00: 22 10 40 74 0f 00 20 02 05 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-768 [Opus] IDE (rev 04)
00: 22 10 41 74 05 00 00 02 04 8a 01 01 00 40 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 f0 00 00 00 00 00 00 00 00 00 00 22 10 41 74
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI (rev 03)
00: 22 10 43 74 00 00 80 02 03 00 80 06 00 40 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 22 10 43 74
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00:08.0 Ethernet controller: Intel Corp.: Unknown device 100f (rev 01)
00: 86 80 0f 10 17 00 30 02 01 00 00 02 10 40 00 00
10: 04 00 00 f4 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 10 00 00 00 00 00 00 00 00 00 00 86 80 01 10
30: 00 00 00 00 dc 00 00 00 00 00 00 00 0a 01 ff 00

00:09.0 Ethernet controller: Intel Corp.: Unknown device 100f (rev 01)
00: 86 80 0f 10 17 00 30 02 01 00 00 02 10 40 00 00
10: 04 00 02 f4 00 00 00 00 00 00 00 00 00 00 00 00
20: 41 10 00 00 00 00 00 00 00 00 00 00 86 80 01 10
30: 00 00 00 00 dc 00 00 00 00 00 00 00 09 01 ff 00

00:10.0 PCI bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] PCI (rev 05)
00: 22 10 48 74 17 00 20 22 05 00 04 06 00 63 01 00
10: 00 00 00 00 00 00 00 00 00 02 02 a8 20 20 00 22
20: 10 f4 f0 f5 f0 ff 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 00 0c 00

02:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-768 [Opus] USB (rev 07)
00: 22 10 49 74 17 00 80 82 07 10 03 0c 00 40 00 00
10: 00 00 10 f4 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 22 10 49 74
30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 04 00 50

02:07.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00: 02 10 52 47 87 00 90 02 27 00 00 03 10 42 00 00
10: 00 00 00 f5 01 20 00 00 00 10 10 f4 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 02 10 08 80
30: 00 00 00 00 5c 00 00 00 00 00 00 00 ff 00 08 00

02:08.0 Ethernet controller: 3Com Corporation 3c980-TX 10/100baseTX NIC [Python-T] (rev 78)
00: b7 10 05 98 17 00 10 02 78 00 00 02 10 50 00 00
10: 01 24 00 00 00 20 10 f4 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 f1 10 62 24
30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 0a 0a

02:09.0 Ethernet controller: 3Com Corporation 3c980-TX 10/100baseTX NIC [Python-T] (rev 78)
00: b7 10 05 98 17 00 10 02 78 00 00 02 10 50 00 00
10: 81 24 00 00 00 24 10 f4 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 f1 10 62 24
30: 00 00 00 00 dc 00 00 00 00 00 00 00 05 01 0a 0a

00:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System Controller (rev 11)
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Latency: 64
	Region 0: Memory at f8000000 (32-bit, prefetchable) [size=64M]
	Region 1: Memory at f6200000 (32-bit, prefetchable) [size=4K]
	Region 2: I/O ports at 1090 [disabled] [size=4]
	Capabilities: [a0] AGP version 2.0
		Status: RQ=15 SBA+ 64bit- FW- Rate=x1,x2
		Command: RQ=0 SBA+ AGP+ 64bit- FW- Rate=<none>

00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] AGP Bridge (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=68
	BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-

00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] ISA (rev 05)
	Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 0

00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-768 [Opus] IDE (rev 04) (prog-if 8a [Master SecP PriP])
	Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] IDE
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64
	Region 4: I/O ports at f000 [size=16]

00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI (rev 03)
	Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-

00:08.0 Ethernet controller: Intel Corp.: Unknown device 100f (rev 01)
	Subsystem: Intel Corp.: Unknown device 1001
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64 (63750ns min), cache line size 10
	Interrupt: pin A routed to IRQ 10
	Region 0: Memory at f4000000 (64-bit, non-prefetchable) [size=128K]
	Region 4: I/O ports at 1000 [size=64]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [e4] PCI-X non-bridge device.
		Command: DPERE- ERO+ RBC=0 OST=0
		Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
	Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
		Address: 0000000000000000  Data: 0000

00:09.0 Ethernet controller: Intel Corp.: Unknown device 100f (rev 01)
	Subsystem: Intel Corp.: Unknown device 1001
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 64 (63750ns min), cache line size 10
	Interrupt: pin A routed to IRQ 9
	Region 0: Memory at f4020000 (64-bit, non-prefetchable) [size=128K]
	Region 4: I/O ports at 1040 [size=64]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [e4] PCI-X non-bridge device.
		Command: DPERE- ERO+ RBC=0 OST=0
		Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM-
	Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
		Address: 0000000000000000  Data: 0000

00:10.0 PCI bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] PCI (rev 05) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
	Latency: 99
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=168
	I/O behind bridge: 00002000-00002fff
	Memory behind bridge: f4100000-f5ffffff
	BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-

02:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-768 [Opus] USB (rev 07) (prog-if 10 [OHCI])
	Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] USB
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR+
	Latency: 64 (20000ns max)
	Interrupt: pin D routed to IRQ 10
	Region 0: Memory at f4100000 (32-bit, non-prefetchable) [size=4K]

02:07.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA])
	Subsystem: ATI Technologies Inc: Unknown device 8008
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 66 (2000ns min), cache line size 10
	Region 0: Memory at f5000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: I/O ports at 2000 [size=256]
	Region 2: Memory at f4101000 (32-bit, non-prefetchable) [size=4K]
	Expansion ROM at <unassigned> [disabled] [size=128K]
	Capabilities: [5c] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

02:08.0 Ethernet controller: 3Com Corporation 3c980-TX 10/100baseTX NIC [Python-T] (rev 78)
	Subsystem: Tyan Computer: Unknown device 2462
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 80 (2500ns min, 2500ns max), cache line size 10
	Interrupt: pin A routed to IRQ 11
	Region 0: I/O ports at 2400 [size=128]
	Region 1: Memory at f4102000 (32-bit, non-prefetchable) [size=128]
	Expansion ROM at <unassigned> [disabled] [size=128K]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=2 PME-

02:09.0 Ethernet controller: 3Com Corporation 3c980-TX 10/100baseTX NIC [Python-T] (rev 78)
	Subsystem: Tyan Computer: Unknown device 2462
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
	Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
	Latency: 80 (2500ns min, 2500ns max), cache line size 10
	Interrupt: pin A routed to IRQ 5
	Region 0: I/O ports at 2480 [size=128]
	Region 1: Memory at f4102400 (32-bit, non-prefetchable) [size=128]
	Expansion ROM at <unassigned> [disabled] [size=128K]
	Capabilities: [dc] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=2 PME-


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Update on e1000 troubles (over-heating!)
@ 2002-10-06  7:33 Feldman, Scott
  2002-10-08 18:44 ` Ben Greear
  0 siblings, 1 reply; 18+ messages in thread
From: Feldman, Scott @ 2002-10-06  7:33 UTC (permalink / raw)
  To: 'Ben Greear'; +Cc: linux-kernel, 'netdev@oss.sgi.com'

> I believe I have figured out why the e1000 crashed my machine 
> after .5 - 1 hours:  The NIC was over-heating.  I measured 
> one of the NICs after the machine crashed with an external 
> (cheap) temp probe.  It registered right at 50 degrees C, and 
> this was about 15-30 seconds after it crashed.

Ben, please send lspci -x on the hot nic.

-scott

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2002-10-15  7:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-06  3:38 Update on e1000 troubles (over-heating!) Ben Greear
2002-10-06  3:47 ` Andre Hedrick
2002-10-06 22:38   ` jamal
2002-10-07  0:14     ` Andre Hedrick
2002-10-07 11:56       ` jamal
2002-10-07  3:46     ` Ben Greear
2002-10-07  5:26       ` David S. Miller
2002-10-07 11:53       ` jamal
2002-10-07 11:58         ` David S. Miller
2002-10-07 16:40         ` Ben Greear
2002-10-06  7:33 Feldman, Scott
2002-10-08 18:44 ` Ben Greear
2002-10-15  2:20 Feldman, Scott
2002-10-15  2:37 ` Andi Kleen
2002-10-15  2:54   ` Jonathan Lundell
2002-10-15  5:42 ` Dave Hansen
2002-10-15  7:07   ` Ben Greear
2002-10-15  7:01 ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).