All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: [2.4] e1000 leaking pci mappings on rx errors?
@ 2004-01-12 17:43 Feldman, Scott
  2004-01-12 20:47 ` Olof Johansson
  0 siblings, 1 reply; 6+ messages in thread
From: Feldman, Scott @ 2004-01-12 17:43 UTC (permalink / raw)
  To: olof, netdev; +Cc: cramerj

> We have a machine here (running a RHEL 2.4.21-based kernel), 
> that started showing leakage of PCI mappings when the driver 
> was upgraded from 5.1.11 to 5.2.20.

Olof, please try 5.1.13 and 5.2.16 from sf.net/projects/e1000 to help
narrow the diff.  I'm not seeing anything obvious in the diff between
5.1.11 and 5.2.20 that would explain this.

-scott

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.4] e1000 leaking pci mappings on rx errors?
  2004-01-12 17:43 [2.4] e1000 leaking pci mappings on rx errors? Feldman, Scott
@ 2004-01-12 20:47 ` Olof Johansson
  0 siblings, 0 replies; 6+ messages in thread
From: Olof Johansson @ 2004-01-12 20:47 UTC (permalink / raw)
  To: Feldman, Scott; +Cc: netdev, cramerj, milliner

Feldman, Scott wrote:
>>We have a machine here (running a RHEL 2.4.21-based kernel), 
>>that started showing leakage of PCI mappings when the driver 
>>was upgraded from 5.1.11 to 5.2.20.
> 
> 
> Olof, please try 5.1.13 and 5.2.16 from sf.net/projects/e1000 to help
> narrow the diff.  I'm not seeing anything obvious in the diff between
> 5.1.11 and 5.2.20 that would explain this.

Scott,

5.1.13 is OK, 5.2.16 is leaking.

Also, I noticed it's leaking quite fast, and even before any RX errors 
are shown. So my previous guess w.r.t. cause and effect might have been 
wrong:

[root@primerib linux-2.4.21-6.EL-olof]# netstat -i
Kernel Interface table
Iface       MTU Met    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP 
TX-OVR Flg
eth1       1500   0    54793      0      0      0    67048      0      0 
      0 BMRU

~6240 PCI mappings had been allocated with the above statistics (eth1 is 
the problematic interface in our case).


-Olof

-- 
Olof Johansson                                        Office: 4F005/905
pSeries Linux Development                             IBM Systems Group
Email: olof@austin.ibm.com                          Phone: 512-838-9858
All opinions are my own and not those of IBM

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.4] e1000 leaking pci mappings on rx errors?
  2004-01-13  0:35 ` Olof Johansson
@ 2004-01-13  1:01   ` Olof Johansson
  0 siblings, 0 replies; 6+ messages in thread
From: Olof Johansson @ 2004-01-13  1:01 UTC (permalink / raw)
  To: Olof Johansson; +Cc: Feldman, Scott, netdev, cramerj, milliner

Olof Johansson wrote:
> Feldman, Scott wrote:
> 
>> Ok, most of the changes to the driver are on the Tx side.  Add this to
>> 5.2.20 and let's see if we're hitting this path.  Maybe there is still
>> something wrong with the Tx unwind case where we run out of resources:
> 
> 
> It looks like I temporarily lost the machine, but I could give it one 
> run before it happened. It was bursting huge number of those messages, 
> but I had no chance to correlate them to the number of mappings that 
> were leaked, since the machine pretty much locked up (serial console + 
> too much kernel printks = very very slow machine).

I got it back quicker than I thought. :)

It seems that after about 150k 'queue stopped' events we hit the case 
where it's leaked up to 1k pci mappings. The queue stopped messages 
start arriving as soon as the network load goes up. That's about the 
same point that the first rx errors show up too, in small numbers (<50 
total).


-Olof

-- 
Olof Johansson                                        Office: 4F005/905
pSeries Linux Development                             IBM Systems Group
Email: olof@austin.ibm.com                          Phone: 512-838-9858
All opinions are my own and not those of IBM

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [2.4] e1000 leaking pci mappings on rx errors?
  2004-01-12 23:36 Feldman, Scott
@ 2004-01-13  0:35 ` Olof Johansson
  2004-01-13  1:01   ` Olof Johansson
  0 siblings, 1 reply; 6+ messages in thread
From: Olof Johansson @ 2004-01-13  0:35 UTC (permalink / raw)
  To: Feldman, Scott; +Cc: netdev, cramerj, milliner

Feldman, Scott wrote:

> Ok, most of the changes to the driver are on the Tx side.  Add this to
> 5.2.20 and let's see if we're hitting this path.  Maybe there is still
> something wrong with the Tx unwind case where we run out of resources:

It looks like I temporarily lost the machine, but I could give it one 
run before it happened. It was bursting huge number of those messages, 
but I had no chance to correlate them to the number of mappings that 
were leaked, since the machine pretty much locked up (serial console + 
too much kernel printks = very very slow machine).


-Olof

-- 
Olof Johansson                                        Office: 4F005/905
pSeries Linux Development                             IBM Systems Group
Email: olof@austin.ibm.com                          Phone: 512-838-9858
All opinions are my own and not those of IBM

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [2.4] e1000 leaking pci mappings on rx errors?
@ 2004-01-12 23:36 Feldman, Scott
  2004-01-13  0:35 ` Olof Johansson
  0 siblings, 1 reply; 6+ messages in thread
From: Feldman, Scott @ 2004-01-12 23:36 UTC (permalink / raw)
  To: Olof Johansson; +Cc: netdev, cramerj, milliner

> Also, I noticed it's leaking quite fast, and even before any 
> RX errors are shown. So my previous guess w.r.t. cause and effect
might 
> have been wrong:

Ok, most of the changes to the driver are on the Tx side.  Add this to
5.2.20 and let's see if we're hitting this path.  Maybe there is still
something wrong with the Tx unwind case where we run out of resources:

diff -Nuarp e1000-5.2.20/src/e1000_main.c
e1000-5.2.20-mod/src/e1000_main.c
--- e1000-5.2.20/src/e1000_main.c       2003-09-30 23:22:19.000000000
-0700
+++ e1000-5.2.20-mod/src/e1000_main.c   2004-01-12 16:09:06.000000000
-0800
@@ -1837,6 +1837,8 @@ e1000_xmit_frame(struct sk_buff *skb, st
        if((count = e1000_tx_map(adapter, skb, first)))
                e1000_tx_queue(adapter, count, tx_flags);
        else {
+               static int stops = 0;
+               printk(KERN_ERR "stopping queue %d\n", ++stops);
                netif_stop_queue(netdev);
                return 1;
        }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [2.4] e1000 leaking pci mappings on rx errors?
@ 2004-01-12 16:42 olof
  0 siblings, 0 replies; 6+ messages in thread
From: olof @ 2004-01-12 16:42 UTC (permalink / raw)
  To: netdev; +Cc: cramerj, scott.feldman


We have a machine here (running a RHEL 2.4.21-based kernel), that started
showing leakage of PCI mappings when the driver was upgraded from 5.1.11
to 5.2.20.

The system is a large-config ppc64 machine with 8 interfaces, each running
at or near full load. NAPI is enabled, frame size is 1500. We're seeing RX
errors on eth0, which is the only interface that is leaking TCE entries
(pci mappings).

The system is also running at full cpu load, with each interface having
it's irq bound to an individual CPU. It's always the interface being bound
to cpu0 that's showing errors (could possibly be because of rx ring
overruns?).

With the previous version (5.1.11), we were still seeing the RX errors,
but no TCE leaks.

As far as I can tell, the driver is leaking less than one mapping per
error, since there are more RX errors than total allocated TCE entries for
the interface. Number of errors after a run is in the range of 15-20k,
while number of used entries are in the range of 3-4k.

Has anyone else seen anything like this? I noticed there's a slightly
newer e1000 driver available, but I saw no changes that seemed relevant.


-Olof

Olof Johansson                                        Office: 4F005/905
Linux on Power Development                            IBM Systems Group
Email: olof@austin.ibm.com                          Phone: 512-838-9858
All opinions are my own and not those of IBM

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-01-13  1:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-12 17:43 [2.4] e1000 leaking pci mappings on rx errors? Feldman, Scott
2004-01-12 20:47 ` Olof Johansson
  -- strict thread matches above, loose matches on Subject: below --
2004-01-12 23:36 Feldman, Scott
2004-01-13  0:35 ` Olof Johansson
2004-01-13  1:01   ` Olof Johansson
2004-01-12 16:42 olof

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.