All of lore.kernel.org
 help / color / mirror / Atom feed
* [2.6.20-rt8] "Neighbour table overflow."
@ 2007-03-16 16:13 Guennadi Liakhovetski
  2007-03-16 16:16 ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Guennadi Liakhovetski @ 2007-03-16 16:13 UTC (permalink / raw)
  To: linux-rt-users, irda-users

Hi

After fixing the irda spinlock init bug, here's the next one I've run 
into: after a long irnet/ppp test with a periodic disturbance, whereby the 
ppp connection is broken and restored every ~20 secnds, I've got this from 
the kernel:

user.warn kernel: Neighbour table overflow.

and the system cannot build any new outgoing network connections - ppp or 
ethernet. I can connect from outside, but from the system produces:

# ping 192.168.1.1
connect: No buffer space available

where "new" connections is to new hosts. To known hosts connections can be 
further built. And, here's another one:

# ping 192.168.1.5
PING 192.168.1.5 (192.168.1.5) 56(84) bytes of data.
64 bytes from 192.168.1.5: icmp_seq=1 ttl=64 time=3.45 ms
64 bytes from 192.168.1.5: icmp_seq=1 ttl=64 time=8.34 ms (DUP!)
64 bytes from 192.168.1.5: icmp_seq=1 ttl=64 time=9.29 ms (DUP!)
64 bytes from 192.168.1.5: icmp_seq=1 ttl=64 time=11.5 ms (DUP!)
64 bytes from 192.168.1.5: icmp_seq=2 ttl=64 time=4.79 ms
64 bytes from 192.168.1.5: icmp_seq=2 ttl=64 time=7.86 ms (DUP!)
64 bytes from 192.168.1.5: icmp_seq=2 ttl=64 time=10.8 ms (DUP!)
64 bytes from 192.168.1.5: icmp_seq=2 ttl=64 time=12.0 ms (DUP!)
64 bytes from 192.168.1.5: icmp_seq=3 ttl=64 time=4.04 ms
64 bytes from 192.168.1.5: icmp_seq=3 ttl=64 time=8.17 ms (DUP!)
64 bytes from 192.168.1.5: icmp_seq=3 ttl=64 time=11.1 ms (DUP!)
64 bytes from 192.168.1.5: icmp_seq=3 ttl=64 time=13.2 ms (DUP!)

any ideas?...

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [2.6.20-rt8] "Neighbour table overflow."
  2007-03-16 16:13 [2.6.20-rt8] "Neighbour table overflow." Guennadi Liakhovetski
@ 2007-03-16 16:16 ` Ingo Molnar
  2007-03-16 16:23   ` [irda-users] " Samuel Ortiz
  2007-03-16 16:26   ` Guennadi Liakhovetski
  0 siblings, 2 replies; 26+ messages in thread
From: Ingo Molnar @ 2007-03-16 16:16 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: linux-rt-users, irda-users


* Guennadi Liakhovetski <gl@dsa-ac.de> wrote:

> Hi
> 
> After fixing the irda spinlock init bug, [...]

could you send the patch for that? Is it upstream already?

> user.warn kernel: Neighbour table overflow.

hm, i havent seen this one yet.

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-16 16:16 ` Ingo Molnar
@ 2007-03-16 16:23   ` Samuel Ortiz
  2007-03-16 16:26   ` Guennadi Liakhovetski
  1 sibling, 0 replies; 26+ messages in thread
From: Samuel Ortiz @ 2007-03-16 16:23 UTC (permalink / raw)
  To: mingo, gl; +Cc: irda-users, linux-rt-users


Hi Ingo,

On 3/16/2007, "Ingo Molnar" <mingo@elte.hu> wrote:

>
>* Guennadi Liakhovetski <gl@dsa-ac.de> wrote:
>
>> Hi
>>
>> After fixing the irda spinlock init bug, [...]
>
>could you send the patch for that? Is it upstream already?
Guennadi sent it to me earlier this afternoon. I will forward it upstream
later today.

Cheers,
Samuel.


>> user.warn kernel: Neighbour table overflow.
>
>hm, i havent seen this one yet.
>
>	Ingo
>
>-------------------------------------------------------------------------
>Take Surveys. Earn Cash. Influence the Future of IT
>Join SourceForge.net's Techsay panel and you'll get the chance to share your
>opinions on IT & business topics through brief surveys-and earn cash
>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>_______________________________________________
>irda-users mailing list
>irda-users@lists.sourceforge.net
>http://lists.sourceforge.net/lists/listinfo/irda-users

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [2.6.20-rt8] "Neighbour table overflow."
  2007-03-16 16:16 ` Ingo Molnar
  2007-03-16 16:23   ` [irda-users] " Samuel Ortiz
@ 2007-03-16 16:26   ` Guennadi Liakhovetski
  2007-03-16 16:29     ` Ingo Molnar
                       ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Guennadi Liakhovetski @ 2007-03-16 16:26 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-rt-users, irda-users

On Fri, 16 Mar 2007, Ingo Molnar wrote:

> * Guennadi Liakhovetski <gl@dsa-ac.de> wrote:
>>
>> After fixing the irda spinlock init bug, [...]
>
> could you send the patch for that? Is it upstream already?

It's indeed an IrDA bug (though mainly visible under rt-preempt) and the 
patch has been sent to the maintainer and to stable.

>> user.warn kernel: Neighbour table overflow.
>
> hm, i havent seen this one yet.

it comes from net/ipv4/route.c::rt_intern_hash()... That's all I can say 
so far. Any debug ideas? Could you try on any your rt-test system just a 
loop like

while true; do
 	ifdown eth0
 	ifup eth0
 	ping -c 1 host
done

or similar. Or, better yet with starting / stopping pppd?

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [2.6.20-rt8] "Neighbour table overflow."
  2007-03-16 16:26   ` Guennadi Liakhovetski
@ 2007-03-16 16:29     ` Ingo Molnar
  2007-03-16 16:41     ` [irda-users] " Samuel Ortiz
  2007-03-16 19:15     ` David Miller
  2 siblings, 0 replies; 26+ messages in thread
From: Ingo Molnar @ 2007-03-16 16:29 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: linux-rt-users, irda-users


* Guennadi Liakhovetski <gl@dsa-ac.de> wrote:

> > could you send the patch for that? Is it upstream already?
> 
> It's indeed an IrDA bug (though mainly visible under rt-preempt) and 
> the patch has been sent to the maintainer and to stable.

could you please bounce it to me too? I usually put such patches into 
-rt straight away and i drop them when they show up upstream.

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-16 16:26   ` Guennadi Liakhovetski
  2007-03-16 16:29     ` Ingo Molnar
@ 2007-03-16 16:41     ` Samuel Ortiz
  2007-03-16 16:45       ` Samuel Ortiz
  2007-03-16 19:15     ` David Miller
  2 siblings, 1 reply; 26+ messages in thread
From: Samuel Ortiz @ 2007-03-16 16:41 UTC (permalink / raw)
  To: gl, mingo; +Cc: irda-users, linux-rt-users


Reply from Ingo, got mistakenly discarded by the irda-users mailing list:

* Guennadi Liakhovetski <gl@dsa-ac.de> wrote:

> > could you send the patch for that? Is it upstream already?
>
> It's indeed an IrDA bug (though mainly visible under rt-preempt) and
> the patch has been sent to the maintainer and to stable.

could you please bounce it to me too? I usually put such patches into
-rt straight away and i drop them when they show up upstream.

Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-16 16:41     ` [irda-users] " Samuel Ortiz
@ 2007-03-16 16:45       ` Samuel Ortiz
  0 siblings, 0 replies; 26+ messages in thread
From: Samuel Ortiz @ 2007-03-16 16:45 UTC (permalink / raw)
  To: gl, mingo; +Cc: irda-users, linux-rt-users


Here you go, Ingo:

 Index: net/irda/irttp.c
===================================================================
RCS file: /home/dsa/cvs/linux-2_6/net/irda/irttp.c,v
retrieving revision 1.1.1.11
diff -u -r1.1.1.11 irttp.c
--- a/net/irda/irttp.c 5 Mar 2007 13:18:59 -0000 1.1.1.11
+++ b/net/irda/irttp.c 16 Mar 2007 11:04:31 -0000
@@ -1455,6 +1468,7 @@

/* Not everything should be copied */
new->notify.instance = instance;
+ spin_lock_init(&new->lock);
init_timer(&new->todo_timer);

skb_queue_head_init(&new->rx_queue);


On 3/16/2007, "Samuel Ortiz" <samuel@sortiz.org> wrote:

>
>Reply from Ingo, got mistakenly discarded by the irda-users mailing list:
>
>* Guennadi Liakhovetski <gl@dsa-ac.de> wrote:
>
>> > could you send the patch for that? Is it upstream already?
>>
>> It's indeed an IrDA bug (though mainly visible under rt-preempt) and
>> the patch has been sent to the maintainer and to stable.
>
>could you please bounce it to me too? I usually put such patches into
>-rt straight away and i drop them when they show up upstream.
>
>Ingo
>
>-------------------------------------------------------------------------
>Take Surveys. Earn Cash. Influence the Future of IT
>Join SourceForge.net's Techsay panel and you'll get the chance to share your
>opinions on IT & business topics through brief surveys-and earn cash
>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>_______________________________________________
>irda-users mailing list
>irda-users@lists.sourceforge.net
>http://lists.sourceforge.net/lists/listinfo/irda-users

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [2.6.20-rt8] "Neighbour table overflow."
  2007-03-16 16:26   ` Guennadi Liakhovetski
  2007-03-16 16:29     ` Ingo Molnar
  2007-03-16 16:41     ` [irda-users] " Samuel Ortiz
@ 2007-03-16 19:15     ` David Miller
       [not found]       ` <20070316.121539.78711178.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  2 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2007-03-16 19:15 UTC (permalink / raw)
  To: gl; +Cc: mingo, linux-rt-users, irda-users

From: Guennadi Liakhovetski <gl@dsa-ac.de>
Date: Fri, 16 Mar 2007 17:26:57 +0100 (CET)

> it comes from net/ipv4/route.c::rt_intern_hash()... That's all I can say 
> so far. Any debug ideas? Could you try on any your rt-test system just a 
> loop like

There is an SKB leak somewhere, and that leaked SKB has references to
a route, that's what that message usually means.

After you accumulate enough leaked packets with routed attached them,
the system refuses to create any more of them.

Probably if you look at /proc/slabinfo you'll see a lot of allocated
sk_buff and rtinfo objects.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [2.6.20-rt8] "Neighbour table overflow."
       [not found]       ` <20070316.121539.78711178.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2007-03-19 15:21         ` Guennadi Liakhovetski
  2007-03-20 11:18           ` 2.6.20-rt8 APIC error joel silvestre
       [not found]           ` <Pine.LNX.4.63.0703191617250.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
  0 siblings, 2 replies; 26+ messages in thread
From: Guennadi Liakhovetski @ 2007-03-19 15:21 UTC (permalink / raw)
  To: David Miller
  Cc: irda-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, mingo-X9Un+BFzKDI,
	linux-rt-users-u79uwXL29TY76Z2rM5mHXA

On Fri, 16 Mar 2007, David Miller wrote:

> From: Guennadi Liakhovetski <gl-o/hVf8ie6tKzQB+pC5nmwQ@public.gmane.org>
> Date: Fri, 16 Mar 2007 17:26:57 +0100 (CET)
>
>> it comes from net/ipv4/route.c::rt_intern_hash()... That's all I can say
>> so far. Any debug ideas? Could you try on any your rt-test system just a
>> loop like
>
> There is an SKB leak somewhere, and that leaked SKB has references to
> a route, that's what that message usually means.

Do you have an idea how to debug this? I tried just replacing all 
(dev_)alloc_skb and (dev_)kfree_skb in net/irda and drivers/net/irda, and 
I get for each skb address more free than alloc calls...

> After you accumulate enough leaked packets with routed attached them,
> the system refuses to create any more of them.

What does this mean - packets with routing attached to them? Are these 
only some specific packets and the rest don't have this? can I detect 
them?

> Probably if you look at /proc/slabinfo you'll see a lot of allocated
> sk_buff and rtinfo objects.

skbuff_head_cache grows

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 26+ messages in thread

* 2.6.20-rt8  APIC error
  2007-03-19 15:21         ` Guennadi Liakhovetski
@ 2007-03-20 11:18           ` joel silvestre
       [not found]           ` <Pine.LNX.4.63.0703191617250.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
  1 sibling, 0 replies; 26+ messages in thread
From: joel silvestre @ 2007-03-20 11:18 UTC (permalink / raw)
  To: linux-rt-users


Hello, 

found this for the first and only time this morning:


Mar 20 11:42:32 localhost powersaved: Loading acpi_cpufreq
Mar 20 11:42:32 localhost kernel:  [<c0104f7a>] dump_trace+0x63/0x1e5
Mar 20 11:42:32 localhost kernel:  [<c0105116>] show_trace_log_lvl
+0x1a/0x2f
Mar 20 11:42:32 localhost kernel:  [<c0105731>] show_trace+0x12/0x14
Mar 20 11:42:32 localhost kernel:  [<c01057c4>] dump_stack+0x16/0x18
Mar 20 11:42:32 localhost kernel:  [<c010ef6e>] smp_error_interrupt
+0x55/0x5c
Mar 20 11:42:32 localhost kernel:  [<c01049d0>] error_interrupt
+0x28/0x30
Mar 20 11:42:32 localhost kernel:  [<c01751cf>] shrink_dcache_parent
+0x3c/0xdc
Mar 20 11:42:32 localhost kernel:  [<c016b997>] dentry_unhash+0x1e/0x72
Mar 20 11:42:32 localhost kernel:  [<c016cb2e>] vfs_rmdir+0x79/0xf4
Mar 20 11:42:32 localhost kernel:  [<c016e634>] do_rmdir+0x89/0xbb
Mar 20 11:42:32 localhost kernel:  [<c016e6a5>] sys_rmdir+0x10/0x12
Mar 20 11:42:32 localhost kernel:  [<c0103f40>] syscall_call+0x7/0xb
Mar 20 11:42:32 localhost kernel:  [<b7f322ed>] 0xb7f322ed
Mar 20 11:42:32 localhost kernel:  =======================
Mar 20 11:42:32 localhost powersaved: enter 'acpi_cpufreq' into
CPUFREQD_MODULE in /etc/powersave/cpufreq.

joel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [2.6.20-rt8] "Neighbour table overflow."
       [not found]           ` <Pine.LNX.4.63.0703191617250.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
@ 2007-03-20 15:21             ` Guennadi Liakhovetski
  2007-03-20 15:30               ` [irda-users] " Guennadi Liakhovetski
  0 siblings, 1 reply; 26+ messages in thread
From: Guennadi Liakhovetski @ 2007-03-20 15:21 UTC (permalink / raw)
  To: David Miller
  Cc: irda-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, mingo-X9Un+BFzKDI,
	linux-rt-users-u79uwXL29TY76Z2rM5mHXA

On Mon, 19 Mar 2007, Guennadi Liakhovetski wrote:

> On Fri, 16 Mar 2007, David Miller wrote:
>>
>> There is an SKB leak somewhere, and that leaked SKB has references to
>> a route, that's what that message usually means.
>
> Do you have an idea how to debug this? I tried just replacing all 
> (dev_)alloc_skb and (dev_)kfree_skb in net/irda and drivers/net/irda, and I 
> get for each skb address more free than alloc calls...

I wrote a small hashing debugger - from __aloc_skb I store the new skb on 
a hash, from __kfree_skb I remove it from it. The first strange thing that 
occurred to me is that __kfree_skb is very often called on already freed 
skbs. Is it normal? It seems strange as trying to free memory that you no 
longer own can free a newly allocated object at that address.

As for the original problem, looks like the packets that get leaked are 
from af_irda.c::irda_sendmsg(). Any ideas where and when these skbs should 
be freed?

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-20 15:21             ` [2.6.20-rt8] "Neighbour table overflow." Guennadi Liakhovetski
@ 2007-03-20 15:30               ` Guennadi Liakhovetski
  2007-03-20 16:23                 ` Samuel Ortiz
  0 siblings, 1 reply; 26+ messages in thread
From: Guennadi Liakhovetski @ 2007-03-20 15:30 UTC (permalink / raw)
  To: David Miller; +Cc: irda-users, mingo, linux-rt-users

On Tue, 20 Mar 2007, Guennadi Liakhovetski wrote:

> As for the original problem, looks like the packets that get leaked are
> from af_irda.c::irda_sendmsg(). Any ideas where and when these skbs should
> be freed?

No, I actually only see that they are created in sock_alloc_send_skb(), 
so, I assumed it was the call in af_irda.c::irda_sendmsg(), which, 
however, seems to be not the case now.

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-20 15:30               ` [irda-users] " Guennadi Liakhovetski
@ 2007-03-20 16:23                 ` Samuel Ortiz
  2007-03-20 17:02                   ` Guennadi Liakhovetski
  0 siblings, 1 reply; 26+ messages in thread
From: Samuel Ortiz @ 2007-03-20 16:23 UTC (permalink / raw)
  To: gl, davem; +Cc: irda-users, mingo, linux-rt-users


On 3/20/2007, "Guennadi Liakhovetski" <gl@dsa-ac.de> wrote:

>On Tue, 20 Mar 2007, Guennadi Liakhovetski wrote:
>
>> As for the original problem, looks like the packets that get leaked are
>> from af_irda.c::irda_sendmsg(). Any ideas where and when these skbs should
>> be freed?
>
>No, I actually only see that they are created in sock_alloc_send_skb(),
>so, I assumed it was the call in af_irda.c::irda_sendmsg(), which,
>however, seems to be not the case now.
Well, if you're using IrNET, then those sock_alloc_send_skb() calls
should come from the IP code (net/ipv4/af_inet.c). Do you have a way to
verify that ?
The skb should eventually be freed by the IrDA hardware driver, when the
frame is actually transmitted. Are you using the pxaficp_ir driver ?

Cheers,
Samuel.

P.S.: I propose we remove the -rt people and Mingo from this discussion
until we know for sure that it's an RT bug. If no one objects, I'll
remove them from the CC list.


>Thanks
>Guennadi
>---------------------------------
>Guennadi Liakhovetski, Ph.D.
>DSA Daten- und Systemtechnik GmbH
>Pascalstr. 28
>D-52076 Aachen
>Germany
>
>-------------------------------------------------------------------------
>Take Surveys. Earn Cash. Influence the Future of IT
>Join SourceForge.net's Techsay panel and you'll get the chance to share your
>opinions on IT & business topics through brief surveys-and earn cash
>http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>_______________________________________________
>irda-users mailing list
>irda-users@lists.sourceforge.net
>http://lists.sourceforge.net/lists/listinfo/irda-users

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-20 16:23                 ` Samuel Ortiz
@ 2007-03-20 17:02                   ` Guennadi Liakhovetski
       [not found]                     ` <Pine.LNX.4.63.0703210849570.534@pcgl.dsa-ac.de>
  0 siblings, 1 reply; 26+ messages in thread
From: Guennadi Liakhovetski @ 2007-03-20 17:02 UTC (permalink / raw)
  To: Samuel Ortiz; +Cc: davem, irda-users, mingo, linux-rt-users

On Tue, 20 Mar 2007, Samuel Ortiz wrote:

> Well, if you're using IrNET, then those sock_alloc_send_skb() calls
> should come from the IP code (net/ipv4/af_inet.c). Do you have a way to
> verify that ?

The test is running atm. If I've done everything right, I'll have the 
result tomorrow.

> The skb should eventually be freed by the IrDA hardware driver, when the
> frame is actually transmitted. Are you using the pxaficp_ir driver ?

Hardware driver?... hm, interesting. Yes, it's pxaficp_ir.

> P.S.: I propose we remove the -rt people and Mingo from this discussion
> until we know for sure that it's an RT bug. If no one objects, I'll
> remove them from the CC list.

No objections.

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [2.6.20-rt8] "Neighbour table overflow."
       [not found]                       ` <Pine.LNX.4.63.0703210849570.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
@ 2007-03-21 10:51                         ` Guennadi Liakhovetski
       [not found]                           ` <Pine.LNX.4.63.0703211130140.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Guennadi Liakhovetski @ 2007-03-21 10:51 UTC (permalink / raw)
  To: Samuel Ortiz
  Cc: irda-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, netdev-u79uwXL29TY76Z2rM5mHXA

(Short recap for newly added to cc: netdev: I'm seeing an skb leak in 
2.6.20 during an IrDA IrNET+ppp UDP test with periodic connection 
disruptions)

On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:

> On Tue, 20 Mar 2007, Guennadi Liakhovetski wrote:
>
> Ok, looks like all leaked skbuffs come from ip_append_data(), like this:
>
> (sock_alloc_send_skb+0x2c8/0x2e4)
> (ip_append_data+0x7fc/0xa80)
> (udp_sendmsg+0x248/0x68c)
> (inet_sendmsg+0x60/0x64)
> (sock_sendmsg+0xb4/0xe4)
>  r4 = C3CB4960
> (sys_sendto+0xc8/0xf0)
>  r4 = 00000000
> (sys_socketcall+0x168/0x1f0)
> (ret_fast_syscall+0x0/0x2c)

This call to sock_alloc_send_skb() in ip_append_data() is not from the 
inlined ip_ufo_append_data(), it is here:

 			/* The last fragment gets additional space at tail.
 			 * Note, with MSG_MORE we overallocate on fragments,
 			 * because we have no idea what fragment will be
 			 * the last.
 			 */
 			if (datalen == length + fraggap)
 				alloclen += rt->u.dst.trailer_len;

 			if (transhdrlen) {
 				skb = sock_alloc_send_skb(sk,
 						alloclen + hh_len + 15,
 						(flags & MSG_DONTWAIT), &err);
 			} else {

Then, I traced a couple of paths how such a skbuff, coming down from 
ip_append_data() and allocated above get freed (when they do):

[<c0182380>] (__kfree_skb+0x0/0x170) from [<c0182514>] (kfree_skb+0x24/0x50)
  r5 = C332BC00  r4 = C332BC00 
[<c01824f0>] (kfree_skb+0x0/0x50) from [<bf0fac58>] (irlap_update_nr_received+0x94/0xc8 [irda])
[<bf0fabc4>] (irlap_update_nr_received+0x0/0xc8 [irda]) from [<bf0fda98>] (irlap_state_nrm_p+0x530/0x7c0 [irda])
  r7 = 00000001  r6 = C0367EC0  r5 = C332BC00  r4 = 00000000
[<bf0fd568>] (irlap_state_nrm_p+0x0/0x7c0 [irda]) from [<bf0fbd90>] (irlap_do_event+0x68/0x18c [irda])
[<bf0fbd28>] (irlap_do_event+0x0/0x18c [irda]) from [<bf1008cc>] (irlap_driver_rcv+0x1f0/0xd38 [irda])
[<bf1006dc>] (irlap_driver_rcv+0x0/0xd38 [irda]) from [<c01892c0>] (netif_receive_skb+0x244/0x338)
[<c018907c>] (netif_receive_skb+0x0/0x338) from [<c0189468>] (process_backlog+0xb4/0x194)
[<c01893b4>] (process_backlog+0x0/0x194) from [<c01895f8>] (net_rx_action+0xb0/0x210)
[<c0189548>] (net_rx_action+0x0/0x210) from [<c0042f7c>] (ksoftirqd+0x108/0x1cc)
[<c0042e74>] (ksoftirqd+0x0/0x1cc) from [<c0053614>] (kthread+0x10c/0x138)
[<c0053508>] (kthread+0x0/0x138) from [<c003f918>] (do_exit+0x0/0x8b0)
  r8 = 00000000  r7 = 00000000  r6 = 00000000  r5 = 00000000
  r4 = 00000000

and

[<c0182380>] (__kfree_skb+0x0/0x170) from [<c0182514>] (kfree_skb+0x24/0x50)
  r5 = C03909E0  r4 = C1A97400 
[<c01824f0>] (kfree_skb+0x0/0x50) from [<c0199bf8>] (pfifo_fast_enqueue+0xb4/0xd0)
[<c0199b44>] (pfifo_fast_enqueue+0x0/0xd0) from [<c0188c30>] (dev_queue_xmit+0x17c/0x25c)
  r8 = C1A2DCE0  r7 = FFFFFFF4  r6 = C3393114  r5 = C03909E0
  r4 = C3393000 
[<c0188ab4>] (dev_queue_xmit+0x0/0x25c) from [<c01a7c18>] (ip_output+0x150/0x254)
  r7 = C3717120  r6 = C03909E0  r5 = 00000000  r4 = C1A2DCE0
[<c01a7ac8>] (ip_output+0x0/0x254) from [<c01a93d0>] (ip_push_pending_frames+0x368/0x4d4)
[<c01a9068>] (ip_push_pending_frames+0x0/0x4d4) from [<c01c6954>] (udp_push_pending_frames+0x14c/0x310)
[<c01c6808>] (udp_push_pending_frames+0x0/0x310) from [<c01c70d8>] (udp_sendmsg+0x5c0/0x690)
[<c01c6b18>] (udp_sendmsg+0x0/0x690) from [<c01ceafc>] (inet_sendmsg+0x60/0x64)
[<c01cea9c>] (inet_sendmsg+0x0/0x64) from [<c017c970>] (sock_sendmsg+0xb4/0xe4)
  r7 = C2CEFDF4  r6 = 00000064  r5 = C2CEFEA8  r4 = C3C94080
[<c017c8bc>] (sock_sendmsg+0x0/0xe4) from [<c017dd9c>] (sys_sendto+0xc8/0xf0)
  r7 = 00000064  r6 = C3571580  r5 = C2CEFEC4  r4 = 00000000
[<c017dcd4>] (sys_sendto+0x0/0xf0) from [<c017e654>] (sys_socketcall+0x168/0x1f0)
[<c017e4ec>] (sys_socketcall+0x0/0x1f0) from [<c001ff40>] (ret_fast_syscall+0x0/0x2c)
  r5 = 00415344  r4 = 00000000

I would be greatful for any hints how I can identify which skbuff's get 
lost and why, and where and who should free them.

I am not subscribed to netdev, please keep in cc.

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [2.6.20-rt8] "Neighbour table overflow."
       [not found]                           ` <Pine.LNX.4.63.0703211130140.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
@ 2007-03-21 11:26                             ` Samuel Ortiz
  2007-03-21 12:01                               ` [irda-users] " Guennadi Liakhovetski
  0 siblings, 1 reply; 26+ messages in thread
From: Samuel Ortiz @ 2007-03-21 11:26 UTC (permalink / raw)
  To: gl-o/hVf8ie6tKzQB+pC5nmwQ
  Cc: irda-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, netdev-u79uwXL29TY76Z2rM5mHXA


On 3/21/2007, "Guennadi Liakhovetski" <gl-o/hVf8ie6tKzQB+pC5nmwQ@public.gmane.org> wrote:

>(Short recap for newly added to cc: netdev: I'm seeing an skb leak in
>2.6.20 during an IrDA IrNET+ppp UDP test with periodic connection
>disruptions)
>
>On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:
>
>> On Tue, 20 Mar 2007, Guennadi Liakhovetski wrote:
>>
>> Ok, looks like all leaked skbuffs come from ip_append_data(), like this:
>>
>> (sock_alloc_send_skb+0x2c8/0x2e4)
>> (ip_append_data+0x7fc/0xa80)
>> (udp_sendmsg+0x248/0x68c)
>> (inet_sendmsg+0x60/0x64)
>> (sock_sendmsg+0xb4/0xe4)
>>  r4 = C3CB4960
>> (sys_sendto+0xc8/0xf0)
>>  r4 = 00000000
>> (sys_socketcall+0x168/0x1f0)
>> (ret_fast_syscall+0x0/0x2c)
>
>This call to sock_alloc_send_skb() in ip_append_data() is not from the
>inlined ip_ufo_append_data(), it is here:
>
> 			/* The last fragment gets additional space at tail.
> 			 * Note, with MSG_MORE we overallocate on fragments,
> 			 * because we have no idea what fragment will be
> 			 * the last.
> 			 */
> 			if (datalen == length + fraggap)
> 				alloclen += rt->u.dst.trailer_len;
>
> 			if (transhdrlen) {
> 				skb = sock_alloc_send_skb(sk,
> 						alloclen + hh_len + 15,
> 						(flags & MSG_DONTWAIT), &err);
> 			} else {
>
>Then, I traced a couple of paths how such a skbuff, coming down from
>ip_append_data() and allocated above get freed (when they do):
>
>[<c0182380>] (__kfree_skb+0x0/0x170) from [<c0182514>] (kfree_skb+0x24/0x50)
>  r5 = C332BC00  r4 = C332BC00
>[<c01824f0>] (kfree_skb+0x0/0x50) from [<bf0fac58>] (irlap_update_nr_received+0x94/0xc8 [irda])
>[<bf0fabc4>] (irlap_update_nr_received+0x0/0xc8 [irda]) from [<bf0fda98>] (irlap_state_nrm_p+0x530/0x7c0 [irda])
>  r7 = 00000001  r6 = C0367EC0  r5 = C332BC00  r4 = 00000000
>[<bf0fd568>] (irlap_state_nrm_p+0x0/0x7c0 [irda]) from [<bf0fbd90>] (irlap_do_event+0x68/0x18c [irda])
>[<bf0fbd28>] (irlap_do_event+0x0/0x18c [irda]) from [<bf1008cc>] (irlap_driver_rcv+0x1f0/0xd38 [irda])
>[<bf1006dc>] (irlap_driver_rcv+0x0/0xd38 [irda]) from [<c01892c0>] (netif_receive_skb+0x244/0x338)
>[<c018907c>] (netif_receive_skb+0x0/0x338) from [<c0189468>] (process_backlog+0xb4/0x194)
>[<c01893b4>] (process_backlog+0x0/0x194) from [<c01895f8>] (net_rx_action+0xb0/0x210)
>[<c0189548>] (net_rx_action+0x0/0x210) from [<c0042f7c>] (ksoftirqd+0x108/0x1cc)
>[<c0042e74>] (ksoftirqd+0x0/0x1cc) from [<c0053614>] (kthread+0x10c/0x138)
>[<c0053508>] (kthread+0x0/0x138) from [<c003f918>] (do_exit+0x0/0x8b0)
>  r8 = 00000000  r7 = 00000000  r6 = 00000000  r5 = 00000000
>  r4 = 00000000
This is the IrDA RX path, so I doubt the corresponding skb ever got
through
ip_append_data(). The skb was allocated by your HW driver upon packet
reception, then queued to the net input queue, and finally passed to the
IrDA stack. Are you sure your tracing is correct ?


>and
>
>[<c0182380>] (__kfree_skb+0x0/0x170) from [<c0182514>] (kfree_skb+0x24/0x50)
>  r5 = C03909E0  r4 = C1A97400
>[<c01824f0>] (kfree_skb+0x0/0x50) from [<c0199bf8>] (pfifo_fast_enqueue+0xb4/0xd0)
>[<c0199b44>] (pfifo_fast_enqueue+0x0/0xd0) from [<c0188c30>] (dev_queue_xmit+0x17c/0x25c)
>  r8 = C1A2DCE0  r7 = FFFFFFF4  r6 = C3393114  r5 = C03909E0
>  r4 = C3393000
>[<c0188ab4>] (dev_queue_xmit+0x0/0x25c) from [<c01a7c18>] (ip_output+0x150/0x254)
>  r7 = C3717120  r6 = C03909E0  r5 = 00000000  r4 = C1A2DCE0
>[<c01a7ac8>] (ip_output+0x0/0x254) from [<c01a93d0>] (ip_push_pending_frames+0x368/0x4d4)
>[<c01a9068>] (ip_push_pending_frames+0x0/0x4d4) from [<c01c6954>] (udp_push_pending_frames+0x14c/0x310)
>[<c01c6808>] (udp_push_pending_frames+0x0/0x310) from [<c01c70d8>] (udp_sendmsg+0x5c0/0x690)
>[<c01c6b18>] (udp_sendmsg+0x0/0x690) from [<c01ceafc>] (inet_sendmsg+0x60/0x64)
>[<c01cea9c>] (inet_sendmsg+0x0/0x64) from [<c017c970>] (sock_sendmsg+0xb4/0xe4)
>  r7 = C2CEFDF4  r6 = 00000064  r5 = C2CEFEA8  r4 = C3C94080
>[<c017c8bc>] (sock_sendmsg+0x0/0xe4) from [<c017dd9c>] (sys_sendto+0xc8/0xf0)
>  r7 = 00000064  r6 = C3571580  r5 = C2CEFEC4  r4 = 00000000
>[<c017dcd4>] (sys_sendto+0x0/0xf0) from [<c017e654>] (sys_socketcall+0x168/0x1f0)
>[<c017e4ec>] (sys_socketcall+0x0/0x1f0) from [<c001ff40>] (ret_fast_syscall+0x0/0x2c)
>  r5 = 00415344  r4 = 00000000
This one is on the TX path, yes. However it got dropped and freed because
your TX queue was full. Any idea in which situation does that happen ?


>I would be greatful for any hints how I can identify which skbuff's get
>lost and why, and where and who should free them.
You're seeing skb leaks when cutting the ppp connection periodically,
right ?
Do you such leaks when not cutting the ppp connection ?
If not, could you send me a kernel trace (with irda debug set to 5) when
the ppp connection is shut down ? It would narrow down the problem a bit.
I'm quite sure the leak is in the IrDA code rather than in the ppp or
ipv4
one, hence the need for full irda debug...

Cheers,
Samuel.

>I am not subscribed to netdev, please keep in cc.
>
>Thanks
>Guennadi
>---------------------------------
>Guennadi Liakhovetski, Ph.D.
>DSA Daten- und Systemtechnik GmbH
>Pascalstr. 28
>D-52076 Aachen
>Germany

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-21 11:26                             ` Samuel Ortiz
@ 2007-03-21 12:01                               ` Guennadi Liakhovetski
  2007-03-23 12:14                                 ` Guennadi Liakhovetski
  0 siblings, 1 reply; 26+ messages in thread
From: Guennadi Liakhovetski @ 2007-03-21 12:01 UTC (permalink / raw)
  To: Samuel Ortiz; +Cc: irda-users, davem, netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4848 bytes --]

On Wed, 21 Mar 2007, Samuel Ortiz wrote:

> On 3/21/2007, "Guennadi Liakhovetski" <gl@dsa-ac.de> wrote:
>
>> [<c0182380>] (__kfree_skb+0x0/0x170) from [<c0182514>] (kfree_skb+0x24/0x50)
>>  r5 = C332BC00  r4 = C332BC00
>> [<c01824f0>] (kfree_skb+0x0/0x50) from [<bf0fac58>] (irlap_update_nr_received+0x94/0xc8 [irda])
>> [<bf0fabc4>] (irlap_update_nr_received+0x0/0xc8 [irda]) from [<bf0fda98>] (irlap_state_nrm_p+0x530/0x7c0 [irda])
>>  r7 = 00000001  r6 = C0367EC0  r5 = C332BC00  r4 = 00000000
>> [<bf0fd568>] (irlap_state_nrm_p+0x0/0x7c0 [irda]) from [<bf0fbd90>] (irlap_do_event+0x68/0x18c [irda])
>> [<bf0fbd28>] (irlap_do_event+0x0/0x18c [irda]) from [<bf1008cc>] (irlap_driver_rcv+0x1f0/0xd38 [irda])
>> [<bf1006dc>] (irlap_driver_rcv+0x0/0xd38 [irda]) from [<c01892c0>] (netif_receive_skb+0x244/0x338)
>> [<c018907c>] (netif_receive_skb+0x0/0x338) from [<c0189468>] (process_backlog+0xb4/0x194)
>> [<c01893b4>] (process_backlog+0x0/0x194) from [<c01895f8>] (net_rx_action+0xb0/0x210)
>> [<c0189548>] (net_rx_action+0x0/0x210) from [<c0042f7c>] (ksoftirqd+0x108/0x1cc)
>> [<c0042e74>] (ksoftirqd+0x0/0x1cc) from [<c0053614>] (kthread+0x10c/0x138)
>> [<c0053508>] (kthread+0x0/0x138) from [<c003f918>] (do_exit+0x0/0x8b0)
>>  r8 = 00000000  r7 = 00000000  r6 = 00000000  r5 = 00000000
>>  r4 = 00000000
> This is the IrDA RX path, so I doubt the corresponding skb ever got
> through
> ip_append_data(). The skb was allocated by your HW driver upon packet
> reception, then queued to the net input queue, and finally passed to the
> IrDA stack. Are you sure your tracing is correct ?

I've added a bitfield to struct sk_buff:

  	__u8			pkt_type:3,
  				fclone:2,
-				ipvs_property:1;
+				ipvs_property:1,
+				trace_dbg:1;

and I set itin ip_append_data() before sock_alloc_send_skb() is called. 
Then I check this bit in __kfree_skb(). The bit is set to 0 in __alloc_skb 
per

 	memset(skb, 0, offsetof(struct sk_buff, truesize));

So, if it was a freshly allocated skb, the tracing should be correct.

>> [<c0182380>] (__kfree_skb+0x0/0x170) from [<c0182514>] (kfree_skb+0x24/0x50)
>>  r5 = C03909E0  r4 = C1A97400
>> [<c01824f0>] (kfree_skb+0x0/0x50) from [<c0199bf8>] (pfifo_fast_enqueue+0xb4/0xd0)
>> [<c0199b44>] (pfifo_fast_enqueue+0x0/0xd0) from [<c0188c30>] (dev_queue_xmit+0x17c/0x25c)
>>  r8 = C1A2DCE0  r7 = FFFFFFF4  r6 = C3393114  r5 = C03909E0
>>  r4 = C3393000
>> [<c0188ab4>] (dev_queue_xmit+0x0/0x25c) from [<c01a7c18>] (ip_output+0x150/0x254)
>>  r7 = C3717120  r6 = C03909E0  r5 = 00000000  r4 = C1A2DCE0
>> [<c01a7ac8>] (ip_output+0x0/0x254) from [<c01a93d0>] (ip_push_pending_frames+0x368/0x4d4)
>> [<c01a9068>] (ip_push_pending_frames+0x0/0x4d4) from [<c01c6954>] (udp_push_pending_frames+0x14c/0x310)
>> [<c01c6808>] (udp_push_pending_frames+0x0/0x310) from [<c01c70d8>] (udp_sendmsg+0x5c0/0x690)
>> [<c01c6b18>] (udp_sendmsg+0x0/0x690) from [<c01ceafc>] (inet_sendmsg+0x60/0x64)
>> [<c01cea9c>] (inet_sendmsg+0x0/0x64) from [<c017c970>] (sock_sendmsg+0xb4/0xe4)
>>  r7 = C2CEFDF4  r6 = 00000064  r5 = C2CEFEA8  r4 = C3C94080
>> [<c017c8bc>] (sock_sendmsg+0x0/0xe4) from [<c017dd9c>] (sys_sendto+0xc8/0xf0)
>>  r7 = 00000064  r6 = C3571580  r5 = C2CEFEC4  r4 = 00000000
>> [<c017dcd4>] (sys_sendto+0x0/0xf0) from [<c017e654>] (sys_socketcall+0x168/0x1f0)
>> [<c017e4ec>] (sys_socketcall+0x0/0x1f0) from [<c001ff40>] (ret_fast_syscall+0x0/0x2c)
>>  r5 = 00415344  r4 = 00000000
> This one is on the TX path, yes. However it got dropped and freed because
> your TX queue was full. Any idea in which situation does that happen ?

No. I can only describe what communication is running while ppp is 
disrupted - it's just some sort of udp mirror test - udp packets are sent 
one after another and mirrored back.

>> I would be greatful for any hints how I can identify which skbuff's get
>> lost and why, and where and who should free them.
> You're seeing skb leaks when cutting the ppp connection periodically,
> right ?

Right

> Do you such leaks when not cutting the ppp connection ?

Looks like I don't.

> If not, could you send me a kernel trace (with irda debug set to 5) when
> the ppp connection is shut down ? It would narrow down the problem a bit.

Attached bzipped... It's a complete log starting from irda up, running udp 
packets over the link, closing the link and bringing irda completely down.

> I'm quite sure the leak is in the IrDA code rather than in the ppp or
> ipv4 one, hence the need for full irda debug...

Likely, yes. Why I am asking netdev guys for help is just because I have 
very little idea about the data flow in the network stack(s). And the more 
experienced eyes we have on the problem the sooner we might solve it, I 
hope...

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

[-- Attachment #2: Type: APPLICATION/octet-stream, Size: 11010 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-21 12:01                               ` [irda-users] " Guennadi Liakhovetski
@ 2007-03-23 12:14                                 ` Guennadi Liakhovetski
  2007-03-24  0:36                                   ` Samuel Ortiz
                                                     ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Guennadi Liakhovetski @ 2007-03-23 12:14 UTC (permalink / raw)
  To: Samuel Ortiz; +Cc: irda-users, davem, netdev, Paul Mackerras

On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:

> On Wed, 21 Mar 2007, Samuel Ortiz wrote:
>
>> I'm quite sure the leak is in the IrDA code rather than in the ppp or
>> ipv4 one, hence the need for full irda debug...

Well, looks like you were wrong, Samuel. Below is a patch that fixes ONE 
sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if 
there are more there.

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

Don't leak an sk_buff on interface destruction.

Signed-off-by: G. Liakhovetski <gl@dsa-ac.de>

--- a/drivers/net/ppp_generic.c	2007-03-23 13:04:04.000000000 +0100
+++ b/drivers/net/ppp_generic.c	2007-03-23 13:05:29.000000000 +0100
@@ -2544,6 +2544,9 @@
  	ppp->active_filter = NULL;
  #endif /* CONFIG_PPP_FILTER */

+	if (ppp->xmit_pending)
+		kfree_skb(ppp->xmit_pending);
+
  	kfree(ppp);
  }


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-23 12:14                                 ` Guennadi Liakhovetski
@ 2007-03-24  0:36                                   ` Samuel Ortiz
  2007-03-26  8:22                                     ` Guennadi Liakhovetski
  2007-03-25  5:10                                   ` David Miller
  2007-03-26  1:01                                   ` [irda-users] " Paul Mackerras
  2 siblings, 1 reply; 26+ messages in thread
From: Samuel Ortiz @ 2007-03-24  0:36 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: irda-users, Paul Mackerras, davem, netdev

On Fri, Mar 23, 2007 at 01:14:43PM +0100, Guennadi Liakhovetski wrote:
> On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:
> 
> > On Wed, 21 Mar 2007, Samuel Ortiz wrote:
> >
> >> I'm quite sure the leak is in the IrDA code rather than in the ppp or
> >> ipv4 one, hence the need for full irda debug...
> 
> Well, looks like you were wrong, Samuel. 
Heh, it's good to be wrong sometimes :-)

> Below is a patch that fixes ONE 
> sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if 
> there are more there.
Are you still seeing the skb cache growing with your fix ?

Cheers,
Samuel.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-23 12:14                                 ` Guennadi Liakhovetski
  2007-03-24  0:36                                   ` Samuel Ortiz
@ 2007-03-25  5:10                                   ` David Miller
       [not found]                                     ` <20070324.221034.45741983.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  2007-03-26  1:02                                     ` Paul Mackerras
  2007-03-26  1:01                                   ` [irda-users] " Paul Mackerras
  2 siblings, 2 replies; 26+ messages in thread
From: David Miller @ 2007-03-25  5:10 UTC (permalink / raw)
  To: gl; +Cc: samuel, irda-users, netdev, paulus

From: Guennadi Liakhovetski <gl@dsa-ac.de>
Date: Fri, 23 Mar 2007 13:14:43 +0100 (CET)

> On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:
> 
> > On Wed, 21 Mar 2007, Samuel Ortiz wrote:
> >
> >> I'm quite sure the leak is in the IrDA code rather than in the ppp or
> >> ipv4 one, hence the need for full irda debug...
> 
> Well, looks like you were wrong, Samuel. Below is a patch that fixes ONE 
> sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if 
> there are more there.

This is strange.

The PPP generic layer seems to be very careful about it's handling of
the ->xmit_pending packet.

When a packet is added to ->xmit_pending, immediately ppp_push() is
called, and ppp_push() gives the packet to the channels xmit function,
and unless the xmit function returns zero the ->xmit_pending is reset
to NULL because non-zero return from the channel xmit functions means
that the driver took the packet.

Now I checked irnet_ppp.c, which is the driver under scrutiny here,
and it never ever returns zero, under any circumstance, it always
return one.

So the ->xmit_pending should always be NULL'd out by ppp_push().

There is some funny BLOCK_WHEN_CONNECT code, which will return
0 in certain cases, but that define it never set during the
build.

Nevermind... that code does get enabled. :(

This looks like it might be a bug, perhaps you can only return zero
from the transmit function when your queue really is full and you plan
to wake things up properly when space appears (via
ppp_output_wakeup()).  You can't return 0 because of an event which
might never occur, that's what makes ->xmit_pending get stuck and
leak.

I'm really surprised this leak doesn't trigger already via the
ppp_synctty.c and ppp_async.c drivers, perhaps they do something to
make sure the transmitter gets purged properly when unregistering and
therefore ->xmit_pending does not get left non-NULL at unregister
time.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [2.6.20-rt8] "Neighbour table overflow."
       [not found]                                     ` <20070324.221034.45741983.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2007-03-25 23:50                                       ` Samuel Ortiz
  2007-03-26  1:36                                         ` [irda-users] " David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: Samuel Ortiz @ 2007-03-25 23:50 UTC (permalink / raw)
  To: David Miller
  Cc: gl-o/hVf8ie6tKzQB+pC5nmwQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	paulus-eUNUBHrolfbYtjvyW6yDsg,
	irda-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

On Sat, Mar 24, 2007 at 10:10:34PM -0700, David Miller wrote:
> From: Guennadi Liakhovetski <gl-o/hVf8ie6tKzQB+pC5nmwQ@public.gmane.org>
> Date: Fri, 23 Mar 2007 13:14:43 +0100 (CET)
> 
> > On Wed, 21 Mar 2007, Guennadi Liakhovetski wrote:
> > 
> > > On Wed, 21 Mar 2007, Samuel Ortiz wrote:
> > >
> > >> I'm quite sure the leak is in the IrDA code rather than in the ppp or
> > >> ipv4 one, hence the need for full irda debug...
> > 
> > Well, looks like you were wrong, Samuel. Below is a patch that fixes ONE 
> > sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if 
> > there are more there.
> 
> This is strange.
> 
> The PPP generic layer seems to be very careful about it's handling of
> the ->xmit_pending packet.
> 
> When a packet is added to ->xmit_pending, immediately ppp_push() is
> called, and ppp_push() gives the packet to the channels xmit function,
> and unless the xmit function returns zero the ->xmit_pending is reset
> to NULL because non-zero return from the channel xmit functions means
> that the driver took the packet.
> 
> Now I checked irnet_ppp.c, which is the driver under scrutiny here,
> and it never ever returns zero, under any circumstance, it always
> return one.
> 
> So the ->xmit_pending should always be NULL'd out by ppp_push().
> 
> There is some funny BLOCK_WHEN_CONNECT code, which will return
> 0 in certain cases, but that define it never set during the
> build.
> 
> Nevermind... that code does get enabled. :(
It is enabled, yes, from net/irda/irnet/irnet.h
It seems Jean added this option to make the connectdelay 0 pppd option
working.

 
> This looks like it might be a bug, perhaps you can only return zero
> from the transmit function when your queue really is full and you plan
> to wake things up properly when space appears (via
> ppp_output_wakeup()).  You can't return 0 because of an event which
> might never occur, that's what makes ->xmit_pending get stuck and
> leak.
I still think Guennadi's fix is correct even if you return 0 from the TX
function only when you're running out of space. If we unregister the ppp
interface before we get a chance to call ppp_output_wake(), then we'll leak
an xmit_pending skb.

I guess Paul will decide here. If he rejects Guennadi's patch, I'll come
up with some IrDA patch so that we free our pending skb from the irnet code,
or try somehow not to block the PPP tx queue while connecting.

Cheers,
Samuel.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-23 12:14                                 ` Guennadi Liakhovetski
  2007-03-24  0:36                                   ` Samuel Ortiz
  2007-03-25  5:10                                   ` David Miller
@ 2007-03-26  1:01                                   ` Paul Mackerras
  2 siblings, 0 replies; 26+ messages in thread
From: Paul Mackerras @ 2007-03-26  1:01 UTC (permalink / raw)
  To: Guennadi Liakhovetski; +Cc: Samuel Ortiz, irda-users, davem, netdev, akpm

Guennadi Liakhovetski writes:

> Don't leak an sk_buff on interface destruction.
> 
> Signed-off-by: G. Liakhovetski <gl@dsa-ac.de>

Acked-by: Paul Mackerras <paulus@samba.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-25  5:10                                   ` David Miller
       [not found]                                     ` <20070324.221034.45741983.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2007-03-26  1:02                                     ` Paul Mackerras
       [not found]                                       ` <17927.7070.803210.600520-UYQwCShxghk5kJ7NmlRacFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Paul Mackerras @ 2007-03-26  1:02 UTC (permalink / raw)
  To: David Miller; +Cc: gl, samuel, irda-users, netdev

David Miller writes:

> The PPP generic layer seems to be very careful about it's handling of
> the ->xmit_pending packet.

Mostly, but I think that this is a genuine leak.

> I'm really surprised this leak doesn't trigger already via the
> ppp_synctty.c and ppp_async.c drivers, perhaps they do something to
> make sure the transmitter gets purged properly when unregistering and
> therefore ->xmit_pending does not get left non-NULL at unregister
> time.

At least for ppp_async, I think what saves us is that pppd does a
TCFLSH ioctl when shutting down the link, which calls
ppp_async_flush_output, which calls ppp_output_wakeup.

Paul.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-25 23:50                                       ` Samuel Ortiz
@ 2007-03-26  1:36                                         ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2007-03-26  1:36 UTC (permalink / raw)
  To: samuel; +Cc: gl, irda-users, paulus, netdev

From: Samuel Ortiz <samuel@sortiz.org>
Date: Mon, 26 Mar 2007 02:50:59 +0300

> I still think Guennadi's fix is correct even if you return 0 from the TX
> function only when you're running out of space. If we unregister the ppp
> interface before we get a chance to call ppp_output_wake(), then we'll leak
> an xmit_pending skb.

I agree, especially after thinking about this some more.

It's just really weird that we've never hit this before,
since this logic has been there forever, that's all :-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [2.6.20-rt8] "Neighbour table overflow."
       [not found]                                       ` <17927.7070.803210.600520-UYQwCShxghk5kJ7NmlRacFaTQe2KTcn/@public.gmane.org>
@ 2007-03-26  1:37                                         ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2007-03-26  1:37 UTC (permalink / raw)
  To: paulus-eUNUBHrolfbYtjvyW6yDsg
  Cc: gl-o/hVf8ie6tKzQB+pC5nmwQ, samuel-jcdQHdrhKHMdnm+yROfE0A,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	irda-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

From: Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
Date: Mon, 26 Mar 2007 11:02:22 +1000

> David Miller writes:
> 
> > The PPP generic layer seems to be very careful about it's handling of
> > the ->xmit_pending packet.
> 
> Mostly, but I think that this is a genuine leak.
> 
> > I'm really surprised this leak doesn't trigger already via the
> > ppp_synctty.c and ppp_async.c drivers, perhaps they do something to
> > make sure the transmitter gets purged properly when unregistering and
> > therefore ->xmit_pending does not get left non-NULL at unregister
> > time.
> 
> At least for ppp_async, I think what saves us is that pppd does a
> TCFLSH ioctl when shutting down the link, which calls
> ppp_async_flush_output, which calls ppp_output_wakeup.

Ok, that makes sense, I'll apply the leak fix for sure
especially since I even have your ACK now :-)

Thanks for reviewing Paul.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [irda-users] [2.6.20-rt8] "Neighbour table overflow."
  2007-03-24  0:36                                   ` Samuel Ortiz
@ 2007-03-26  8:22                                     ` Guennadi Liakhovetski
  0 siblings, 0 replies; 26+ messages in thread
From: Guennadi Liakhovetski @ 2007-03-26  8:22 UTC (permalink / raw)
  To: Samuel Ortiz; +Cc: irda-users, Paul Mackerras, davem, netdev

On Sat, 24 Mar 2007, Samuel Ortiz wrote:

> On Fri, Mar 23, 2007 at 01:14:43PM +0100, Guennadi Liakhovetski wrote:
>
>> Below is a patch that fixes ONE
>> sk_buff leak (maintainer added to cc: hi, Paul:-)). Still investigating if
>> there are more there.
> Are you still seeing the skb cache growing with your fix ?

No, running stable so far.

Thanks
Guennadi
---------------------------------
Guennadi Liakhovetski, Ph.D.
DSA Daten- und Systemtechnik GmbH
Pascalstr. 28
D-52076 Aachen
Germany

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2007-03-26  8:22 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-16 16:13 [2.6.20-rt8] "Neighbour table overflow." Guennadi Liakhovetski
2007-03-16 16:16 ` Ingo Molnar
2007-03-16 16:23   ` [irda-users] " Samuel Ortiz
2007-03-16 16:26   ` Guennadi Liakhovetski
2007-03-16 16:29     ` Ingo Molnar
2007-03-16 16:41     ` [irda-users] " Samuel Ortiz
2007-03-16 16:45       ` Samuel Ortiz
2007-03-16 19:15     ` David Miller
     [not found]       ` <20070316.121539.78711178.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2007-03-19 15:21         ` Guennadi Liakhovetski
2007-03-20 11:18           ` 2.6.20-rt8 APIC error joel silvestre
     [not found]           ` <Pine.LNX.4.63.0703191617250.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
2007-03-20 15:21             ` [2.6.20-rt8] "Neighbour table overflow." Guennadi Liakhovetski
2007-03-20 15:30               ` [irda-users] " Guennadi Liakhovetski
2007-03-20 16:23                 ` Samuel Ortiz
2007-03-20 17:02                   ` Guennadi Liakhovetski
     [not found]                     ` <Pine.LNX.4.63.0703210849570.534@pcgl.dsa-ac.de>
     [not found]                       ` <Pine.LNX.4.63.0703210849570.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
2007-03-21 10:51                         ` Guennadi Liakhovetski
     [not found]                           ` <Pine.LNX.4.63.0703211130140.534-pjC7uWgG1yLsj6RlE8knwQ@public.gmane.org>
2007-03-21 11:26                             ` Samuel Ortiz
2007-03-21 12:01                               ` [irda-users] " Guennadi Liakhovetski
2007-03-23 12:14                                 ` Guennadi Liakhovetski
2007-03-24  0:36                                   ` Samuel Ortiz
2007-03-26  8:22                                     ` Guennadi Liakhovetski
2007-03-25  5:10                                   ` David Miller
     [not found]                                     ` <20070324.221034.45741983.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2007-03-25 23:50                                       ` Samuel Ortiz
2007-03-26  1:36                                         ` [irda-users] " David Miller
2007-03-26  1:02                                     ` Paul Mackerras
     [not found]                                       ` <17927.7070.803210.600520-UYQwCShxghk5kJ7NmlRacFaTQe2KTcn/@public.gmane.org>
2007-03-26  1:37                                         ` David Miller
2007-03-26  1:01                                   ` [irda-users] " Paul Mackerras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.