* RE: [2.4] e1000 leaking pci mappings on rx errors?
@ 2004-01-12 17:43 Feldman, Scott
2004-01-12 20:47 ` Olof Johansson
0 siblings, 1 reply; 6+ messages in thread
From: Feldman, Scott @ 2004-01-12 17:43 UTC (permalink / raw)
To: olof, netdev; +Cc: cramerj
> We have a machine here (running a RHEL 2.4.21-based kernel),
> that started showing leakage of PCI mappings when the driver
> was upgraded from 5.1.11 to 5.2.20.
Olof, please try 5.1.13 and 5.2.16 from sf.net/projects/e1000 to help
narrow the diff. I'm not seeing anything obvious in the diff between
5.1.11 and 5.2.20 that would explain this.
-scott
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [2.4] e1000 leaking pci mappings on rx errors?
2004-01-12 17:43 [2.4] e1000 leaking pci mappings on rx errors? Feldman, Scott
@ 2004-01-12 20:47 ` Olof Johansson
0 siblings, 0 replies; 6+ messages in thread
From: Olof Johansson @ 2004-01-12 20:47 UTC (permalink / raw)
To: Feldman, Scott; +Cc: netdev, cramerj, milliner
Feldman, Scott wrote:
>>We have a machine here (running a RHEL 2.4.21-based kernel),
>>that started showing leakage of PCI mappings when the driver
>>was upgraded from 5.1.11 to 5.2.20.
>
>
> Olof, please try 5.1.13 and 5.2.16 from sf.net/projects/e1000 to help
> narrow the diff. I'm not seeing anything obvious in the diff between
> 5.1.11 and 5.2.20 that would explain this.
Scott,
5.1.13 is OK, 5.2.16 is leaking.
Also, I noticed it's leaking quite fast, and even before any RX errors
are shown. So my previous guess w.r.t. cause and effect might have been
wrong:
[root@primerib linux-2.4.21-6.EL-olof]# netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP
TX-OVR Flg
eth1 1500 0 54793 0 0 0 67048 0 0
0 BMRU
~6240 PCI mappings had been allocated with the above statistics (eth1 is
the problematic interface in our case).
-Olof
--
Olof Johansson Office: 4F005/905
pSeries Linux Development IBM Systems Group
Email: olof@austin.ibm.com Phone: 512-838-9858
All opinions are my own and not those of IBM
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [2.4] e1000 leaking pci mappings on rx errors?
2004-01-13 0:35 ` Olof Johansson
@ 2004-01-13 1:01 ` Olof Johansson
0 siblings, 0 replies; 6+ messages in thread
From: Olof Johansson @ 2004-01-13 1:01 UTC (permalink / raw)
To: Olof Johansson; +Cc: Feldman, Scott, netdev, cramerj, milliner
Olof Johansson wrote:
> Feldman, Scott wrote:
>
>> Ok, most of the changes to the driver are on the Tx side. Add this to
>> 5.2.20 and let's see if we're hitting this path. Maybe there is still
>> something wrong with the Tx unwind case where we run out of resources:
>
>
> It looks like I temporarily lost the machine, but I could give it one
> run before it happened. It was bursting huge number of those messages,
> but I had no chance to correlate them to the number of mappings that
> were leaked, since the machine pretty much locked up (serial console +
> too much kernel printks = very very slow machine).
I got it back quicker than I thought. :)
It seems that after about 150k 'queue stopped' events we hit the case
where it's leaked up to 1k pci mappings. The queue stopped messages
start arriving as soon as the network load goes up. That's about the
same point that the first rx errors show up too, in small numbers (<50
total).
-Olof
--
Olof Johansson Office: 4F005/905
pSeries Linux Development IBM Systems Group
Email: olof@austin.ibm.com Phone: 512-838-9858
All opinions are my own and not those of IBM
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [2.4] e1000 leaking pci mappings on rx errors?
2004-01-12 23:36 Feldman, Scott
@ 2004-01-13 0:35 ` Olof Johansson
2004-01-13 1:01 ` Olof Johansson
0 siblings, 1 reply; 6+ messages in thread
From: Olof Johansson @ 2004-01-13 0:35 UTC (permalink / raw)
To: Feldman, Scott; +Cc: netdev, cramerj, milliner
Feldman, Scott wrote:
> Ok, most of the changes to the driver are on the Tx side. Add this to
> 5.2.20 and let's see if we're hitting this path. Maybe there is still
> something wrong with the Tx unwind case where we run out of resources:
It looks like I temporarily lost the machine, but I could give it one
run before it happened. It was bursting huge number of those messages,
but I had no chance to correlate them to the number of mappings that
were leaked, since the machine pretty much locked up (serial console +
too much kernel printks = very very slow machine).
-Olof
--
Olof Johansson Office: 4F005/905
pSeries Linux Development IBM Systems Group
Email: olof@austin.ibm.com Phone: 512-838-9858
All opinions are my own and not those of IBM
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [2.4] e1000 leaking pci mappings on rx errors?
@ 2004-01-12 23:36 Feldman, Scott
2004-01-13 0:35 ` Olof Johansson
0 siblings, 1 reply; 6+ messages in thread
From: Feldman, Scott @ 2004-01-12 23:36 UTC (permalink / raw)
To: Olof Johansson; +Cc: netdev, cramerj, milliner
> Also, I noticed it's leaking quite fast, and even before any
> RX errors are shown. So my previous guess w.r.t. cause and effect
might
> have been wrong:
Ok, most of the changes to the driver are on the Tx side. Add this to
5.2.20 and let's see if we're hitting this path. Maybe there is still
something wrong with the Tx unwind case where we run out of resources:
diff -Nuarp e1000-5.2.20/src/e1000_main.c
e1000-5.2.20-mod/src/e1000_main.c
--- e1000-5.2.20/src/e1000_main.c 2003-09-30 23:22:19.000000000
-0700
+++ e1000-5.2.20-mod/src/e1000_main.c 2004-01-12 16:09:06.000000000
-0800
@@ -1837,6 +1837,8 @@ e1000_xmit_frame(struct sk_buff *skb, st
if((count = e1000_tx_map(adapter, skb, first)))
e1000_tx_queue(adapter, count, tx_flags);
else {
+ static int stops = 0;
+ printk(KERN_ERR "stopping queue %d\n", ++stops);
netif_stop_queue(netdev);
return 1;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* [2.4] e1000 leaking pci mappings on rx errors?
@ 2004-01-12 16:42 olof
0 siblings, 0 replies; 6+ messages in thread
From: olof @ 2004-01-12 16:42 UTC (permalink / raw)
To: netdev; +Cc: cramerj, scott.feldman
We have a machine here (running a RHEL 2.4.21-based kernel), that started
showing leakage of PCI mappings when the driver was upgraded from 5.1.11
to 5.2.20.
The system is a large-config ppc64 machine with 8 interfaces, each running
at or near full load. NAPI is enabled, frame size is 1500. We're seeing RX
errors on eth0, which is the only interface that is leaking TCE entries
(pci mappings).
The system is also running at full cpu load, with each interface having
it's irq bound to an individual CPU. It's always the interface being bound
to cpu0 that's showing errors (could possibly be because of rx ring
overruns?).
With the previous version (5.1.11), we were still seeing the RX errors,
but no TCE leaks.
As far as I can tell, the driver is leaking less than one mapping per
error, since there are more RX errors than total allocated TCE entries for
the interface. Number of errors after a run is in the range of 15-20k,
while number of used entries are in the range of 3-4k.
Has anyone else seen anything like this? I noticed there's a slightly
newer e1000 driver available, but I saw no changes that seemed relevant.
-Olof
Olof Johansson Office: 4F005/905
Linux on Power Development IBM Systems Group
Email: olof@austin.ibm.com Phone: 512-838-9858
All opinions are my own and not those of IBM
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-01-13 1:01 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-12 17:43 [2.4] e1000 leaking pci mappings on rx errors? Feldman, Scott
2004-01-12 20:47 ` Olof Johansson
-- strict thread matches above, loose matches on Subject: below --
2004-01-12 23:36 Feldman, Scott
2004-01-13 0:35 ` Olof Johansson
2004-01-13 1:01 ` Olof Johansson
2004-01-12 16:42 olof
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.