linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* tg3 bad performance, lots of hardware interrupts
@ 2008-03-27 13:53 Harald Hannelius
  2008-03-27 21:49 ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Harald Hannelius @ 2008-03-27 13:53 UTC (permalink / raw)
  To: linux-kernel


Hi there,

I experience a lot of hardware interrupts with a BCM5751 PCI-express NIC 
(tg3). modprobe tg3, ifconfig ethX up and friends makes the system 
unresponsive. Just having the interface up makes the system sluggish.

Onboard forcedeth works fine (with the same cable).

iperf gives me just 2Mbps on a 1Gbps ethernet. Load average near 1.0. top 
reports 40-50%hi (hardware interrupts) when generating traffic over that 
interface.

The system is a Supermicro H8SMI-2 motherboard, HP EA833AA BROADCOM 
NETXTREME PCI-express NIC, 2GB RAM, Dual-Core opteron 2.8GHz.

All of our other servers with broadcom NIC's work fine with tg3, but they 
aren't PCI-express.

I have tried booting with pci=nomsi and pci=routeirq (BIOS with or without 
"pnp OS" defined). Ubuntu 7.10 live-cd same result. Haven't tried other 
OS'es.

irq_balancing enabled.

Kernel conf: http://www.iki.fi/~harald/kernconf.gz (9kB).

Any hints on what to check for? Hardware or tg3 driverproblem?


I would be glad to report more info, if needed. I have profiling support 
but I have never profiled a kernel before. I haven't got any other PCI-e 
NIC's to test with, yet.


# uname -r
2.6.24.4

# dmesg|grep tg
tg3.c:v3.86 (November 9, 2007)

# lspci -vvv
07:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5751 Gigabit Ethernet PCI Express (rev 21)
 	Subsystem: Broadcom Corporation NetXtreme BCM5751 Gigabit Ethernet PCI Express
 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
 	Latency: 0, Cache Line Size: 64 bytes
 	Interrupt: pin A routed to IRQ 216
 	Region 0: Memory at febf0000 (64-bit, non-prefetchable) [size=64K]
 	Expansion ROM at febe0000 [disabled] [size=64K]
 	Capabilities: [48] Power Management version 2
 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
 		Status: D0 PME-Enable+ DSel=0 DScale=1 PME-
 	Capabilities: [50] Vital Product Data
 	Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable+
 		Address: 00000000fee0300c  Data: 4142
 	Capabilities: [d0] Express Endpoint IRQ 0
 		Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag+
 		Device: Latency L0s <4us, L1 unlimited
 		Device: AtnBtn- AtnInd- PwrInd-
 		Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
 		Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
 		Device: MaxPayload 128 bytes, MaxReadReq 4096 bytes
 		Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 0
 		Link: Latency L0s <4us, L1 <64us
 		Link: ASPM L0s L1 Enabled RCB 64 bytes CommClk+ ExtSynch-
 		Link: Speed 2.5Gb/s, Width x1


-- 
A: Top Posters!                                      |  s/y Charlotta |
Q: What is the most annoying thing on mailing lists? |    FIN-2674    |
   http://www.fe83.org/ Finn Express Purjehtijat ry   |  ============= |
Harald H Hannelius | harald (At) iki (dot) fi | GSM +358 50 594 1020

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 bad performance, lots of hardware interrupts
  2008-03-27 13:53 tg3 bad performance, lots of hardware interrupts Harald Hannelius
@ 2008-03-27 21:49 ` David Miller
  2008-03-28  1:01   ` Michael Chan
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2008-03-27 21:49 UTC (permalink / raw)
  To: harald; +Cc: linux-kernel, netdev, mchan

From: Harald Hannelius <harald@iki.fi>
Date: Thu, 27 Mar 2008 15:53:56 +0200 (EET)

You'll have better luck if you report this to the networking
developer list (netdev) and the tg3 driver maintainer (Michael
Chan), who are both now CC:'d.

> I experience a lot of hardware interrupts with a BCM5751 PCI-express NIC 
> (tg3). modprobe tg3, ifconfig ethX up and friends makes the system 
> unresponsive. Just having the interface up makes the system sluggish.
> 
> Onboard forcedeth works fine (with the same cable).
> 
> iperf gives me just 2Mbps on a 1Gbps ethernet. Load average near 1.0. top 
> reports 40-50%hi (hardware interrupts) when generating traffic over that 
> interface.
> 
> The system is a Supermicro H8SMI-2 motherboard, HP EA833AA BROADCOM 
> NETXTREME PCI-express NIC, 2GB RAM, Dual-Core opteron 2.8GHz.
> 
> All of our other servers with broadcom NIC's work fine with tg3, but they 
> aren't PCI-express.
> 
> I have tried booting with pci=nomsi and pci=routeirq (BIOS with or without 
> "pnp OS" defined). Ubuntu 7.10 live-cd same result. Haven't tried other 
> OS'es.
> 
> irq_balancing enabled.
> 
> Kernel conf: http://www.iki.fi/~harald/kernconf.gz (9kB).
> 
> Any hints on what to check for? Hardware or tg3 driverproblem?
> 
> 
> I would be glad to report more info, if needed. I have profiling support 
> but I have never profiled a kernel before. I haven't got any other PCI-e 
> NIC's to test with, yet.
> 
> 
> # uname -r
> 2.6.24.4
> 
> # dmesg|grep tg
> tg3.c:v3.86 (November 9, 2007)
> 
> # lspci -vvv
> 07:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5751 Gigabit Ethernet PCI Express (rev 21)
>  	Subsystem: Broadcom Corporation NetXtreme BCM5751 Gigabit Ethernet PCI Express
>  	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
>  	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
>  	Latency: 0, Cache Line Size: 64 bytes
>  	Interrupt: pin A routed to IRQ 216
>  	Region 0: Memory at febf0000 (64-bit, non-prefetchable) [size=64K]
>  	Expansion ROM at febe0000 [disabled] [size=64K]
>  	Capabilities: [48] Power Management version 2
>  		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
>  		Status: D0 PME-Enable+ DSel=0 DScale=1 PME-
>  	Capabilities: [50] Vital Product Data
>  	Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable+
>  		Address: 00000000fee0300c  Data: 4142
>  	Capabilities: [d0] Express Endpoint IRQ 0
>  		Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag+
>  		Device: Latency L0s <4us, L1 unlimited
>  		Device: AtnBtn- AtnInd- PwrInd-
>  		Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
>  		Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>  		Device: MaxPayload 128 bytes, MaxReadReq 4096 bytes
>  		Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s L1, Port 0
>  		Link: Latency L0s <4us, L1 <64us
>  		Link: ASPM L0s L1 Enabled RCB 64 bytes CommClk+ ExtSynch-
>  		Link: Speed 2.5Gb/s, Width x1
> 
> 
> -- 
> A: Top Posters!                                      |  s/y Charlotta |
> Q: What is the most annoying thing on mailing lists? |    FIN-2674    |
>    http://www.fe83.org/ Finn Express Purjehtijat ry   |  ============= |
> Harald H Hannelius | harald (At) iki (dot) fi | GSM +358 50 594 1020
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 bad performance, lots of hardware interrupts
  2008-03-27 21:49 ` David Miller
@ 2008-03-28  1:01   ` Michael Chan
  2008-03-28 13:04     ` Harald Hannelius
  0 siblings, 1 reply; 11+ messages in thread
From: Michael Chan @ 2008-03-28  1:01 UTC (permalink / raw)
  To: David Miller; +Cc: harald, linux-kernel, netdev

On Thu, 2008-03-27 at 14:49 -0700, David Miller wrote:
> From: Harald Hannelius <harald@iki.fi>
> Date: Thu, 27 Mar 2008 15:53:56 +0200 (EET)
> 
> > I experience a lot of hardware interrupts with a BCM5751 PCI-express NIC 
> > (tg3). modprobe tg3, ifconfig ethX up and friends makes the system 
> > unresponsive. Just having the interface up makes the system sluggish.
> > 

I just tested a similar NIC using the same kernel and driver, but I did
not notice anything unusual.  netperf gave me 941Mbps.

> > Onboard forcedeth works fine (with the same cable).
> > 
> > iperf gives me just 2Mbps on a 1Gbps ethernet. Load average near 1.0. top 
> > reports 40-50%hi (hardware interrupts) when generating traffic over that 
> > interface.

Can you look at /proc/interrupts to see roughly how many are reported
per second when link is down, link is up with no traffic, and with
traffic?

Finally, you can also try ethtool -t eth0 to see if it passes a simple
self test.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 bad performance, lots of hardware interrupts
  2008-03-28  1:01   ` Michael Chan
@ 2008-03-28 13:04     ` Harald Hannelius
  2008-03-28 17:49       ` Michael Chan
  0 siblings, 1 reply; 11+ messages in thread
From: Harald Hannelius @ 2008-03-28 13:04 UTC (permalink / raw)
  To: Michael Chan; +Cc: David Miller, linux-kernel, netdev


On Thu, 27 Mar 2008, Michael Chan wrote:

> On Thu, 2008-03-27 at 14:49 -0700, David Miller wrote:
>> From: Harald Hannelius <harald@iki.fi>
>> Date: Thu, 27 Mar 2008 15:53:56 +0200 (EET)
>>
>>> I experience a lot of hardware interrupts with a BCM5751 PCI-express NIC
>>> (tg3). modprobe tg3, ifconfig ethX up and friends makes the system
>>> unresponsive. Just having the interface up makes the system sluggish.
>>>
>
> I just tested a similar NIC using the same kernel and driver, but I did
> not notice anything unusual.  netperf gave me 941Mbps.
>
>>> Onboard forcedeth works fine (with the same cable).
>>>
>>> iperf gives me just 2Mbps on a 1Gbps ethernet. Load average near 1.0. top
>>> reports 40-50%hi (hardware interrupts) when generating traffic over that
>>> interface.
>
> Can you look at /proc/interrupts to see roughly how many are reported
> per second when link is down, link is up with no traffic, and with
> traffic?
>
> Finally, you can also try ethtool -t eth0 to see if it passes a simple
> self test.

Phew, I thought that running ethtool -t was like doing stop-A-sync on a 
Sun. It took almost half an hour to run that ethtool -t command;

'mpstat 2' output while running 'ethtool -t eth2':

Full log here (2160 lines): http://www.iki.fi/~harald/mpstat.log

A few lines from that log here;

Linux 2.6.24.4 (mauer)  03/28/2008

10:37:14 AM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
10:37:16 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00    115.50
10:37:18 AM  all    0.00    0.00    1.00    0.00    0.00    0.00    0.00  99.00    123.50
10:37:20 AM  all    0.00    0.00    0.75    0.00    0.00    0.00    0.00   99.25    142.79
10:37:23 AM  all    0.00    0.00   25.66    0.00   23.77    2.26    0.00   48.30     76.38
10:37:25 AM  all    0.00    0.00   17.13    0.00   31.12   21.68    0.00   30.07     75.12
10:37:27 AM  all    0.00    0.00   12.17    0.00   25.48   28.90    0.00   33.46    103.00
10:37:29 AM  all    0.00    0.00    0.40    0.00   38.15   23.29    0.00   38.15     70.15
10:37:31 AM  all    0.00    0.00    0.40    0.00   41.90   23.32    0.00   34.39     71.43
10:37:33 AM  all    0.00    0.00    2.83    0.00   29.68   34.98    0.00   32.51    123.38
10:37:35 AM  all    0.00    0.00   11.07    0.00   33.21   24.81    0.00   30.92     78.74
... goes on like this for almost half-an-hour ...
11:04:03 AM  all    0.00    0.00    0.00    0.00   34.69   22.86    0.00   42.45     71.50
11:04:05 AM  all    0.00    0.00    1.53    0.00   33.72   26.82    0.00   37.93     84.50
11:04:07 AM  all    0.00    0.00    0.40    0.00   38.15   22.89    0.00   38.55     74.26
11:04:09 AM  all    0.00    0.00    0.00    0.00   38.80   21.60    0.00   39.60     72.64
11:04:11 AM  all    0.00    0.00    0.00    0.00   42.17   23.29    0.00   34.54     69.50
11:04:13 AM  all    0.00    0.00    3.60    1.44   31.29   33.45    0.00   30.22    111.39
11:04:15 AM  all    0.00    0.00    0.00    0.00    0.51    0.76    0.00   98.73    110.50
11:04:17 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00    119.50
11:04:19 AM  all    0.00    0.00    0.25    1.24    0.00    0.00    0.00  98.51    126.87
11:04:21 AM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00    106.50
... command finished ...

Sadly I didn't capture the output from ethtool -t, failed anyway. I'm 
running that test a second time right now, I'll post the output later 
(when finished). While the 'ethtool -t eth2' is running eth0 (forcedeth) 
doesn't respond to ping, when going to the console it didn't wake up with 
a keypress, nor did capslock/numlock react.

'mpstat 2' while running modprobe tg3:
Linux 2.6.24.4 (mauer)  03/28/2008

02:47:05 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
02:47:07 PM  all    0.00    0.00    0.75    0.00    0.00    0.00    0.00   99.25    138.00
02:47:09 PM  all    0.00    0.00    0.25    0.00    0.00    0.00    0.00   99.75    120.00
02:47:11 PM  all    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00    132.50
02:47:13 PM  all    0.00    0.00   14.19    0.00   17.16    1.65    0.00  67.00     92.08
02:47:15 PM  all    0.00    0.00   19.60    0.00   30.40    1.20    0.00  48.80     60.50
02:47:17 PM  all    0.56    0.00    4.20    0.00    5.88    0.28    0.00  89.08    127.14
02:47:19 PM  all    0.00    0.00    0.75    0.00    0.25    0.00    0.00  99.00    127.00

'mpstat 2' while running ifconfig eth2 up:
Linux 2.6.24.4 (mauer)  03/28/2008

02:47:50 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
02:47:52 PM  all    0.00    0.00    0.40    0.00   44.95   16.49    0.00   38.16    194.47
02:47:54 PM  all    0.00    0.00    0.76    0.00   46.91   16.77    0.00   35.56    205.47
02:47:56 PM  all    0.00    0.00    2.06    0.00   15.88   14.12    0.00   67.94    107.96

Hope my overlong lines don't get wrapped...


No hints on dmesg on what's going on.



-- 
A: Top Posters!                                      |  s/y Charlotta |
Q: What is the most annoying thing on mailing lists? |    FIN-2674    |
   http://www.fe83.org/ Finn Express Purjehtijat ry   |  ============= |
Harald H Hannelius | harald (At) iki (dot) fi | GSM +358 50 594 1020

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 bad performance, lots of hardware interrupts
  2008-03-28 17:49       ` Michael Chan
@ 2008-03-28 17:12         ` Jiri Kosina
  2008-03-28 17:37           ` Harald Hannelius
  2008-03-28 17:31         ` Harald Hannelius
  1 sibling, 1 reply; 11+ messages in thread
From: Jiri Kosina @ 2008-03-28 17:12 UTC (permalink / raw)
  To: Michael Chan; +Cc: Harald Hannelius, David Miller, linux-kernel, netdev

On Fri, 28 Mar 2008, Michael Chan wrote:

> > Phew, I thought that running ethtool -t was like doing stop-A-sync on 
> > a Sun. It took almost half an hour to run that ethtool -t command;
> Something is very wrong.  ethtool -t should only take a few seconds to
> complete.  You can try ethtool -t eth0 online to reduce the number of
> tests to see if it makes a difference.
> How many of these NICs do you have?  If you have more than one, do they
> all behave the same way?  Have they ever worked well before?

Harald, is the IRQ of eth0 shared with any other device? (cat 
/proc/interrupts will show).

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 bad performance, lots of hardware interrupts
  2008-03-28 17:49       ` Michael Chan
  2008-03-28 17:12         ` Jiri Kosina
@ 2008-03-28 17:31         ` Harald Hannelius
  1 sibling, 0 replies; 11+ messages in thread
From: Harald Hannelius @ 2008-03-28 17:31 UTC (permalink / raw)
  To: Michael Chan; +Cc: David Miller, linux-kernel, netdev


On Fri, 28 Mar 2008, Michael Chan wrote:
> On Fri, 2008-03-28 at 15:04 +0200, Harald Hannelius wrote:

>> Phew, I thought that running ethtool -t was like doing stop-A-sync on
>> a
>> Sun. It took almost half an hour to run that ethtool -t command;
>
> Something is very wrong.  ethtool -t should only take a few seconds to
> complete.  You can try ethtool -t eth0 online to reduce the number of
> tests to see if it makes a difference.

Here's the output of ethtool -t eth2:

The test result is PASS
The test extra info:
nvram test     (online)          0
link test      (online)          0
register test  (offline)         0
memory test    (offline)         0
loopback test  (offline)         0
interrupt test (offline)         0

I just started a 'ethtool -t eth2 online' and that one took just some 10 
seconds or so.

# ethtool -t eth2 online 2>&1 | tee ethtool-output.log
The test result is PASS
The test extra info:
nvram test     (online) 	 0
link test      (online) 	 0
register test  (offline)	 0
memory test    (offline)	 0
loopback test  (offline)	 0
interrupt test (offline)	 0
# mpstat 2 2>&1 | tee ethtool2.log
Linux 2.6.24.4 (mauer) 	03/28/2008

05:22:42 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
05:22:44 PM  all    0.00    0.00    0.00    0.00   11.63    7.56    0.00   80.81    102.99
05:22:46 PM  all    0.00    0.00    0.27    0.00    8.33    4.84    0.00   86.56    123.00
05:22:48 PM  all    0.00    0.00    0.00    0.00   12.03    7.16    0.00   80.80    108.46
05:22:50 PM  all    0.00    0.00    0.00    0.00   10.47    5.23    0.00   84.30    113.93
05:22:52 PM  all    0.00    0.00    0.00    0.00   12.93   10.06    0.00   77.01    135.00
05:22:55 PM  all    0.00    0.00   18.11    0.00   24.25   12.62    0.00   45.02    158.28
05:22:57 PM  all    0.00    0.00    0.82    0.00   13.11   10.38    0.00   75.68    146.04
05:22:59 PM  all    0.00    0.00    0.29    0.00   18.77   12.61    0.00   68.33    136.32
05:23:01 PM  all    0.00    0.00    0.00    0.00   13.64    7.67    0.00   78.69    112.50
05:23:03 PM  all    0.00    0.00    0.00    0.00    9.19    4.86    0.00   85.95    110.50
05:23:05 PM  all    0.00    0.00    0.00    0.00   13.64    7.67    0.00   78.69    108.00

> How many of these NICs do you have?  If you have more than one, do they
> all behave the same way?  Have they ever worked well before?

It's a brand spanking new computer, equipped with three of these BCM5751 
NICs made by HP. I have ripped out all but one, and I have also tested the 
NIC in all three available PCIe slots. Same result.

When all three NIC's where plugged in they didn't work. All three behave 
the same one-by-one.

I don't know if they have worked before.

-- 
A: Top Posters!                                      |  s/y Charlotta |
Q: What is the most annoying thing on mailing lists? |    FIN-2674    |
   http://www.fe83.org/ Finn Express Purjehtijat ry   |  ============= |
Harald H Hannelius | harald (At) iki (dot) fi | GSM +358 50 594 1020

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 bad performance, lots of hardware interrupts
  2008-03-28 17:12         ` Jiri Kosina
@ 2008-03-28 17:37           ` Harald Hannelius
  2008-03-28 19:06             ` Michael Chan
  2008-04-02  8:55             ` Harald Hannelius
  0 siblings, 2 replies; 11+ messages in thread
From: Harald Hannelius @ 2008-03-28 17:37 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Michael Chan, David Miller, linux-kernel, netdev


On Fri, 28 Mar 2008, Jiri Kosina wrote:
> On Fri, 28 Mar 2008, Michael Chan wrote:

>>> Phew, I thought that running ethtool -t was like doing stop-A-sync on
>>> a Sun. It took almost half an hour to run that ethtool -t command;
>> Something is very wrong.  ethtool -t should only take a few seconds to
>> complete.  You can try ethtool -t eth0 online to reduce the number of
>> tests to see if it makes a difference.
>> How many of these NICs do you have?  If you have more than one, do they
>> all behave the same way?  Have they ever worked well before?
>
> Harald, is the IRQ of eth0 shared with any other device? (cat
> /proc/interrupts will show).

# cat /proc/interrupts
            CPU0       CPU1
   0:        111          1   IO-APIC-edge      timer
   1:          0          2   IO-APIC-edge      i8042
   2:          0          0    XT-PIC-XT        cascade
   5:          0          0   IO-APIC-fasteoi   sata_nv
   7:        856         51   IO-APIC-fasteoi   ohci_hcd:usb2
  10:          0          3   IO-APIC-fasteoi   sata_nv, ehci_hcd:usb1
  11:       4305          7   IO-APIC-fasteoi   sata_nv
  12:          0          4   IO-APIC-edge      i8042
216:       4217     128932   PCI-MSI-edge      eth2
217:     161107     685351   PCI-MSI-edge      eth0
NMI:          0          0   Non-maskable interrupts
LOC:    2380762    2619917   Local timer interrupts
RES:       3000       3269   Rescheduling interrupts
CAL:         16         31   function call interrupts
TLB:         64        111   TLB shootdowns
TRM:          0          0   Thermal event interrupts
SPU:          0          0   Spurious interrupts
ERR:          1
MIS:          0

Well, shared or not, yes and no. I think that /proc/interrupts contains 
soft-interrupts. The problem child is interface eth2.

As rapported by ifconfig the interface is on IRQ 5:

# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 00:10:18:30:E6:D6
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:196898 errors:0 dropped:0 overruns:0 frame:0
           TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:69887991 (66.6 MiB)  TX bytes:1216 (1.1 KiB)
           Interrupt:5

That'd be the same as sata_nv.

# ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 00:10:18:30:E6:D6
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:196898 errors:0 dropped:0 overruns:0 frame:0
           TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:69887991 (66.6 MiB)  TX bytes:1216 (1.1 KiB)
           Interrupt:5

I changed the settings "PnP OS" in the BIOS (acpi on/off?) and tried 
booting with both pci=routeirq (or smth like that, see original post) on 
and off to no avail.

I'm stumped. I have never experienced anything quite like this before. 
Usually an IRQ-conflict has crashed my computers, not just slowed them 
down (or maybe these dual-core opterons are just that incredibly fast 
nowadays that the do nothing incredibly fast :) ). Then again, I haven't 
had an IRQ-conflict on my boxen in years.

Buggy motherboard? Buggy NIC? The motherboard has the latest available 
BIOS as per supermicro's webpage.

I'm getting three PCIe e1000's next week, I'll try with these instead.


-- 
A: Top Posters!                                      |  s/y Charlotta |
Q: What is the most annoying thing on mailing lists? |    FIN-2674    |
   http://www.fe83.org/ Finn Express Purjehtijat ry   |  ============= |
Harald H Hannelius | harald (At) iki (dot) fi | GSM +358 50 594 1020

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 bad performance, lots of hardware interrupts
  2008-03-28 13:04     ` Harald Hannelius
@ 2008-03-28 17:49       ` Michael Chan
  2008-03-28 17:12         ` Jiri Kosina
  2008-03-28 17:31         ` Harald Hannelius
  0 siblings, 2 replies; 11+ messages in thread
From: Michael Chan @ 2008-03-28 17:49 UTC (permalink / raw)
  To: Harald Hannelius; +Cc: David Miller, linux-kernel, netdev

On Fri, 2008-03-28 at 15:04 +0200, Harald Hannelius wrote:
> Phew, I thought that running ethtool -t was like doing stop-A-sync on
> a 
> Sun. It took almost half an hour to run that ethtool -t command;

Something is very wrong.  ethtool -t should only take a few seconds to
complete.  You can try ethtool -t eth0 online to reduce the number of
tests to see if it makes a difference.

How many of these NICs do you have?  If you have more than one, do they
all behave the same way?  Have they ever worked well before?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 bad performance, lots of hardware interrupts
  2008-03-28 19:06             ` Michael Chan
@ 2008-03-28 18:09               ` Harald Hannelius
  0 siblings, 0 replies; 11+ messages in thread
From: Harald Hannelius @ 2008-03-28 18:09 UTC (permalink / raw)
  To: Michael Chan; +Cc: Jiri Kosina, David Miller, linux-kernel, netdev


On Fri, 28 Mar 2008, Michael Chan wrote:

> On Fri, 2008-03-28 at 19:37 +0200, Harald Hannelius wrote:
>> # cat /proc/interrupts
>>             CPU0       CPU1
>>    0:        111          1   IO-APIC-edge      timer
>>    1:          0          2   IO-APIC-edge      i8042
>>    2:          0          0    XT-PIC-XT        cascade
>>    5:          0          0   IO-APIC-fasteoi   sata_nv
>>    7:        856         51   IO-APIC-fasteoi   ohci_hcd:usb2
>>   10:          0          3   IO-APIC-fasteoi   sata_nv, ehci_hcd:usb1
>>   11:       4305          7   IO-APIC-fasteoi   sata_nv
>>   12:          0          4   IO-APIC-edge      i8042
>> 216:       4217     128932   PCI-MSI-edge      eth2
>> 217:     161107     685351   PCI-MSI-edge      eth0
>> NMI:          0          0   Non-maskable interrupts
>> LOC:    2380762    2619917   Local timer interrupts
>> RES:       3000       3269   Rescheduling interrupts
>> CAL:         16         31   function call interrupts
>> TLB:         64        111   TLB shootdowns
>> TRM:          0          0   Thermal event interrupts
>> SPU:          0          0   Spurious interrupts
>> ERR:          1
>> MIS:          0
>>
>> Well, shared or not, yes and no. I think that /proc/interrupts
>> contains
>> soft-interrupts. The problem child is interface eth2.
>>
>> As rapported by ifconfig the interface is on IRQ 5:
>
> eth2 is using MSI.  When using MSI, the IRQ reported by ifconfig is not
> accurate.  You said you have tried booting with nomsi, but have you
> confirmed that by checking /proc/interrupts?

I did, at least the interface wasn't on MSI-edge anymore. I cannot 
remember what IRQ the NIC took when booted with pci=nomsi (something like 
that), it could have been 10 or other IRQ below 20, if I recall correctly.


-- 
A: Top Posters!                                      |  s/y Charlotta |
Q: What is the most annoying thing on mailing lists? |    FIN-2674    |
   http://www.fe83.org/ Finn Express Purjehtijat ry   |  ============= |
Harald H Hannelius | harald (At) iki (dot) fi | GSM +358 50 594 1020

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 bad performance, lots of hardware interrupts
  2008-03-28 17:37           ` Harald Hannelius
@ 2008-03-28 19:06             ` Michael Chan
  2008-03-28 18:09               ` Harald Hannelius
  2008-04-02  8:55             ` Harald Hannelius
  1 sibling, 1 reply; 11+ messages in thread
From: Michael Chan @ 2008-03-28 19:06 UTC (permalink / raw)
  To: Harald Hannelius; +Cc: Jiri Kosina, David Miller, linux-kernel, netdev

On Fri, 2008-03-28 at 19:37 +0200, Harald Hannelius wrote:
> # cat /proc/interrupts
>             CPU0       CPU1
>    0:        111          1   IO-APIC-edge      timer
>    1:          0          2   IO-APIC-edge      i8042
>    2:          0          0    XT-PIC-XT        cascade
>    5:          0          0   IO-APIC-fasteoi   sata_nv
>    7:        856         51   IO-APIC-fasteoi   ohci_hcd:usb2
>   10:          0          3   IO-APIC-fasteoi   sata_nv, ehci_hcd:usb1
>   11:       4305          7   IO-APIC-fasteoi   sata_nv
>   12:          0          4   IO-APIC-edge      i8042
> 216:       4217     128932   PCI-MSI-edge      eth2
> 217:     161107     685351   PCI-MSI-edge      eth0
> NMI:          0          0   Non-maskable interrupts
> LOC:    2380762    2619917   Local timer interrupts
> RES:       3000       3269   Rescheduling interrupts
> CAL:         16         31   function call interrupts
> TLB:         64        111   TLB shootdowns
> TRM:          0          0   Thermal event interrupts
> SPU:          0          0   Spurious interrupts
> ERR:          1
> MIS:          0
> 
> Well, shared or not, yes and no. I think that /proc/interrupts
> contains 
> soft-interrupts. The problem child is interface eth2.
> 
> As rapported by ifconfig the interface is on IRQ 5:

eth2 is using MSI.  When using MSI, the IRQ reported by ifconfig is not
accurate.  You said you have tried booting with nomsi, but have you
confirmed that by checking /proc/interrupts?

> 
> # ifconfig eth2
> eth2      Link encap:Ethernet  HWaddr 00:10:18:30:E6:D6
>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>            RX packets:196898 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
>            collisions:0 txqueuelen:1000
>            RX bytes:69887991 (66.6 MiB)  TX bytes:1216 (1.1 KiB)
>            Interrupt:5
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: tg3 bad performance, lots of hardware interrupts
  2008-03-28 17:37           ` Harald Hannelius
  2008-03-28 19:06             ` Michael Chan
@ 2008-04-02  8:55             ` Harald Hannelius
  1 sibling, 0 replies; 11+ messages in thread
From: Harald Hannelius @ 2008-04-02  8:55 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Michael Chan, David Miller, linux-kernel, netdev


On Fri, 28 Mar 2008, Harald Hannelius wrote:
> On Fri, 28 Mar 2008, Jiri Kosina wrote:
>> On Fri, 28 Mar 2008, Michael Chan wrote:
>
>>> Something is very wrong.  ethtool -t should only take a few seconds to
>>> complete.  You can try ethtool -t eth0 online to reduce the number of
>>> tests to see if it makes a difference.
>>> How many of these NICs do you have?  If you have more than one, do they
>>> all behave the same way?  Have they ever worked well before?
>> 
>> Harald, is the IRQ of eth0 shared with any other device? (cat
>> /proc/interrupts will show).
>
> # cat /proc/interrupts
>           CPU0       CPU1
>  0:        111          1   IO-APIC-edge      timer
>  1:          0          2   IO-APIC-edge      i8042
>  2:          0          0    XT-PIC-XT        cascade
>  5:          0          0   IO-APIC-fasteoi   sata_nv
>  7:        856         51   IO-APIC-fasteoi   ohci_hcd:usb2
> 10:          0          3   IO-APIC-fasteoi   sata_nv, ehci_hcd:usb1
> 11:       4305          7   IO-APIC-fasteoi   sata_nv
> 12:          0          4   IO-APIC-edge      i8042
> 216:       4217     128932   PCI-MSI-edge      eth2
> 217:     161107     685351   PCI-MSI-edge      eth0
> NMI:          0          0   Non-maskable interrupts
> LOC:    2380762    2619917   Local timer interrupts
> RES:       3000       3269   Rescheduling interrupts
> CAL:         16         31   function call interrupts
> TLB:         64        111   TLB shootdowns
> TRM:          0          0   Thermal event interrupts
> SPU:          0          0   Spurious interrupts
> ERR:          1
> MIS:          0
>
> Well, shared or not, yes and no. I think that /proc/interrupts contains 
> soft-interrupts. The problem child is interface eth2.
>
> As rapported by ifconfig the interface is on IRQ 5:
>
> # ifconfig eth2
> eth2      Link encap:Ethernet  HWaddr 00:10:18:30:E6:D6
>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>          RX packets:196898 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:1000
>          RX bytes:69887991 (66.6 MiB)  TX bytes:1216 (1.1 KiB)
>          Interrupt:5
>
> That'd be the same as sata_nv.
>
> # ifconfig eth2
> eth2      Link encap:Ethernet  HWaddr 00:10:18:30:E6:D6
>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>          RX packets:196898 errors:0 dropped:0 overruns:0 frame:0
>          TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
>          collisions:0 txqueuelen:1000
>          RX bytes:69887991 (66.6 MiB)  TX bytes:1216 (1.1 KiB)
>          Interrupt:5
>
> I changed the settings "PnP OS" in the BIOS (acpi on/off?) and tried booting 
> with both pci=routeirq (or smth like that, see original post) on and off to 
> no avail.
>
> I'm stumped. I have never experienced anything quite like this before. 
> Usually an IRQ-conflict has crashed my computers, not just slowed them down 
> (or maybe these dual-core opterons are just that incredibly fast nowadays 
> that the do nothing incredibly fast :) ). Then again, I haven't had an 
> IRQ-conflict on my boxen in years.
>
> Buggy motherboard? Buggy NIC? The motherboard has the latest available BIOS 
> as per supermicro's webpage.
>
> I'm getting three PCIe e1000's next week, I'll try with these instead.

For the record, I popped in a couple of PCI-express e1000's and they work 
flawlessly. It's either the interaction between those HP-cards and the 
motherboard, or something with the tg3 driver, I suppose.

Funny though, that e1000e didn't detect the cards, but e1000 did.

-- 
A: Top Posters!                                      |  s/y Charlotta |
Q: What is the most annoying thing on mailing lists? |    FIN-2674    |
   http://www.fe83.org/ Finn Express Purjehtijat ry   |  ============= |
Harald H Hannelius | harald (At) iki (dot) fi | GSM +358 50 594 1020

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-04-02  8:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-27 13:53 tg3 bad performance, lots of hardware interrupts Harald Hannelius
2008-03-27 21:49 ` David Miller
2008-03-28  1:01   ` Michael Chan
2008-03-28 13:04     ` Harald Hannelius
2008-03-28 17:49       ` Michael Chan
2008-03-28 17:12         ` Jiri Kosina
2008-03-28 17:37           ` Harald Hannelius
2008-03-28 19:06             ` Michael Chan
2008-03-28 18:09               ` Harald Hannelius
2008-04-02  8:55             ` Harald Hannelius
2008-03-28 17:31         ` Harald Hannelius

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).