All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: do_IRQ: stack overflow: 872..
       [not found] <1131604877.20041218092730@mail.ru.suse.lists.linux.kernel>
@ 2004-12-18  7:50 ` Andi Kleen
  2004-12-18 11:12   ` Bart De Schuymer
  2005-01-07 17:05   ` David Woodhouse
  0 siblings, 2 replies; 53+ messages in thread
From: Andi Kleen @ 2004-12-18  7:50 UTC (permalink / raw)
  To: Crazy AMD K7; +Cc: linux-kernel, netdev

Crazy AMD K7 <snort2004@mail.ru> writes:

> Hi!
> I have found a few days ago strange messages in /var/log/messages
> More than 10 times there was do_IRQ: stack overflow: (nimber).... followed
> with code. If need I can send all this data. I have run
> ksymoops with only first 3 cases. Here is the first, the second and
> the third are in attachment.
> After that oopses my system continued to work.

It's not really an oops, just a warning that stack space got quiet tight.

The problem seems to be that the br netfilter code is nesting far too
deeply and recursing several times. Looks like a design bug to me,
it shouldn't do that.

> uname uname -a
> Linux linux 2.4.28 #2 ÷ÔÒ îÏÑ 30 15:43:35 MSK 2004 i686 unknown
> gcc -v
> Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
> gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-113)
> I have applies ebtables_brnf patch (http://bridge.sf.net) and a

Don't do that then or contact the author to fix it. Unfortunately
the code is also in 2.6 mainline.

-Andi

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: do_IRQ: stack overflow: 872..
  2004-12-18  7:50 ` do_IRQ: stack overflow: 872 Andi Kleen
@ 2004-12-18 11:12   ` Bart De Schuymer
  2004-12-18 11:14     ` Andi Kleen
  2005-01-07 17:05   ` David Woodhouse
  1 sibling, 1 reply; 53+ messages in thread
From: Bart De Schuymer @ 2004-12-18 11:12 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Crazy AMD K7, linux-kernel, netdev

Op za, 18-12-2004 te 08:50 +0100, schreef Andi Kleen:
> Crazy AMD K7 <snort2004@mail.ru> writes:
> 
> > Hi!
> > I have found a few days ago strange messages in /var/log/messages
> > More than 10 times there was do_IRQ: stack overflow: (nimber).... followed
> > with code. If need I can send all this data. I have run
> > ksymoops with only first 3 cases. Here is the first, the second and
> > the third are in attachment.
> > After that oopses my system continued to work.
> 
> It's not really an oops, just a warning that stack space got quiet tight.
> 
> The problem seems to be that the br netfilter code is nesting far too
> deeply and recursing several times. Looks like a design bug to me,
> it shouldn't do that.
> 
> > uname uname -a
> > Linux linux 2.4.28 #2 ÷ÔÒ îÏÑ 30 15:43:35 MSK 2004 i686 unknown
> > gcc -v
> > Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
> > gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-113)
> > I have applies ebtables_brnf patch (http://bridge.sf.net) and a
> 
> Don't do that then or contact the author to fix it. Unfortunately
> the code is also in 2.6 mainline.
> 

Hi.

The bridge-nf code does not use recursive function calls and there is no
long consecutive function calling. Furthermore, there is no function in
the bridge-nf code that uses a large part of the stack.
Andi, if you make such statements then please point out the code part
you have (of course) read after which you decided to make the statement.
The bridge-nf code is used by quite a few people and by commercial
companies and I have never had a report like this. AMD has been having
all sorts of strange problems for weeks now, they're all somehow related
to bridge-nf, but I doubt he is using the bridge-nf patch on a clean 2.4
kernel.
AMD, is there any chance you can use the latest 2.6 kernel, without
extra patches ?

Bart



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: do_IRQ: stack overflow: 872..
  2004-12-18 11:12   ` Bart De Schuymer
@ 2004-12-18 11:14     ` Andi Kleen
  2004-12-18 11:51       ` Bart De Schuymer
  0 siblings, 1 reply; 53+ messages in thread
From: Andi Kleen @ 2004-12-18 11:14 UTC (permalink / raw)
  To: Bart De Schuymer; +Cc: Andi Kleen, Crazy AMD K7, linux-kernel, netdev

> The bridge-nf code does not use recursive function calls and there is no
> long consecutive function calling. Furthermore, there is no function in
> the bridge-nf code that uses a large part of the stack.

Just take a look at the backtrace in the original post. It clearly
shows a problem. And it points strongly towards br-netfilter.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: do_IRQ: stack overflow: 872..
  2004-12-18 11:14     ` Andi Kleen
@ 2004-12-18 11:51       ` Bart De Schuymer
  2004-12-18 13:53         ` Andi Kleen
  0 siblings, 1 reply; 53+ messages in thread
From: Bart De Schuymer @ 2004-12-18 11:51 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Crazy AMD K7, linux-kernel, netdev

Op za, 18-12-2004 te 12:14 +0100, schreef Andi Kleen:
> > The bridge-nf code does not use recursive function calls and there is no
> > long consecutive function calling. Furthermore, there is no function in
> > the bridge-nf code that uses a large part of the stack.
> 
> Just take a look at the backtrace in the original post. It clearly
> shows a problem. And it points strongly towards br-netfilter.

I don't doubt you are a much better reader of such backtraces than me.
However, let's count the number of times a function from
net/bridge/br_netfilter.c is in the backtrace:
1. br_nf*: 6 times
2. *sabotage*: 3 times
Seriously, out of 222 lines, only 9 from bridge-nf.
The function ip_queue_xmit, OTOH, is 8 times in the trace.

Anyway, as I already suspected weeks ago, AMD must be seeing some
incompatibility between ip_queue (he's using snort) and the bridge-nf
patch.

He is using the patch (I gave it to him) below on top of the bridge-nf
patch. Before using that patch he got a kernel panic occasionally.
However he seems not to get a message in his syslog.

Bart


--- linux-2.4.28-ebt-brnf/net/bridge/br_netfilter.c.old 2004-11-27 23:43:18.000000000 +0100
+++ linux-2.4.28-ebt-brnf/net/bridge/br_netfilter.c     2004-11-27
23:52:05.000000000 +0100
@@ -870,6 +870,10 @@ static unsigned int ip_sabotage_out(unsi
 {
        struct sk_buff *skb = *pskb;
 
+if (!skb) {
+       printk("TROUBLE IN IP_SABOTAGE_OUT: skb==NULL\n");
+       goto in_trouble;
+}
 #ifdef CONFIG_SYSCTL
        if (!skb->nf_bridge) {
                struct vlan_ethhdr *hdr =
@@ -884,6 +888,10 @@ static unsigned int ip_sabotage_out(unsi
        }
 #endif
 
+if (!out) {
+       printk("TROUBLE IN IP_SABOTAGE_OUT: out == NULL\n");
+       goto in_trouble;
+}
        if ((out->hard_start_xmit == br_dev_xmit &&
            okfn != br_nf_forward_finish &&
            okfn != br_nf_local_out_finish &&
@@ -920,6 +928,9 @@ static unsigned int ip_sabotage_out(unsi
        }
 
        return NF_ACCEPT;
+in_trouble:
+       dump_stack();
+       return NF_DROP;
 }
 
 /* For br_nf_local_out we need (prio = NF_BR_PRI_FIRST), to insure that
innocent




^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: do_IRQ: stack overflow: 872..
  2004-12-18 11:51       ` Bart De Schuymer
@ 2004-12-18 13:53         ` Andi Kleen
  2004-12-18 16:07           ` Re[2]: " Crazy AMD K7
  0 siblings, 1 reply; 53+ messages in thread
From: Andi Kleen @ 2004-12-18 13:53 UTC (permalink / raw)
  To: Bart De Schuymer; +Cc: Andi Kleen, Crazy AMD K7, linux-kernel, netdev

On Sat, Dec 18, 2004 at 12:51:30PM +0100, Bart De Schuymer wrote:
> > 
> > Just take a look at the backtrace in the original post. It clearly
> > shows a problem. And it points strongly towards br-netfilter.
> 
> I don't doubt you are a much better reader of such backtraces than me.
> However, let's count the number of times a function from
> net/bridge/br_netfilter.c is in the backtrace:
> 1. br_nf*: 6 times
> 2. *sabotage*: 3 times
> Seriously, out of 222 lines, only 9 from bridge-nf.
> The function ip_queue_xmit, OTOH, is 8 times in the trace.

Yep, but ip_queue_xmit doesn't call itself recursively. Someone 
must be doing it. And that's likely the bridge code.

BTW not all of these entries are probably true, there can 
be a lot of false positives.

> Anyway, as I already suspected weeks ago, AMD must be seeing some
> incompatibility between ip_queue (he's using snort) and the bridge-nf
> patch.
> 
> He is using the patch (I gave it to him) below on top of the bridge-nf
> patch. Before using that patch he got a kernel panic occasionally.
> However he seems not to get a message in his syslog.

Ok, since this report seems to be for a totally non standard
severly hacked up kernel I suppose nothing from it can be concluded for
the mainline kernel. Thanks for clearing this up.

Note to the original poster: when you report a bug with a patched
kernel always mention it.

-Andi




^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re[2]: do_IRQ: stack overflow: 872..
  2004-12-18 13:53         ` Andi Kleen
@ 2004-12-18 16:07           ` Crazy AMD K7
  2004-12-18 16:46             ` Bart De Schuymer
  0 siblings, 1 reply; 53+ messages in thread
From: Crazy AMD K7 @ 2004-12-18 16:07 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Bart De Schuymer, linux-kernel, netdev

> Note to the original poster: when you report a bug with a patched
> kernel always mention it.
I have mentioned earlier and Bart knows it.

I use 2.4.28
+ ebtables-brnf-8_vs_2.4.28.diff
+ U32 patch from patch-o-matic-ng-20040621.tar.bz2
+ patch for br_netfilter.c made by Bart to find out why kernel panic happens(it was a few
  letters ago)
  All patches has applies cleanly.
  U32 doesn't affect on br_netfilter.c

[root@linux kernel]# md5sum linux-2.4.28.tar.bz2
ac7735000d185bc7778c08288760a8a3  linux-2.4.28.tar.bz2
(taken from http://www.ru.kernel.org/pub/linux/kernel/v2.4/linux-2.4.28.tar.bz2)

[root@linux bridge]# md5sum ebtables-brnf-8_vs_2.4.28.diff.gz
30542b1a7a502593afb4d37055ec5e35  ebtables-brnf-8_vs_2.4.28.diff.gz

[root@linux iptables]# md5sum patch-o-matic-ng-20040621.tar.bz2
4fd3c744bf55f119fef6c7c3c4acc4b6  patch-o-matic-ng-20040621.tar.bz2

If the problem will continue appear and will not be solved an any
way, of course, I will use 2.6 kernel, now I am not ready to use it.

Pasha



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Re[2]: do_IRQ: stack overflow: 872..
  2004-12-18 16:07           ` Re[2]: " Crazy AMD K7
@ 2004-12-18 16:46             ` Bart De Schuymer
  0 siblings, 0 replies; 53+ messages in thread
From: Bart De Schuymer @ 2004-12-18 16:46 UTC (permalink / raw)
  To: Crazy AMD K7; +Cc: Andi Kleen, linux-kernel, netdev

Op za, 18-12-2004 te 19:07 +0300, schreef Crazy AMD K7:
> > Note to the original poster: when you report a bug with a patched
> > kernel always mention it.
> I have mentioned earlier and Bart knows it.
> 
> I use 2.4.28
> + ebtables-brnf-8_vs_2.4.28.diff
> + U32 patch from patch-o-matic-ng-20040621.tar.bz2
> + patch for br_netfilter.c made by Bart to find out why kernel panic happens(it was a few
>   letters ago)
>   All patches has applies cleanly.
>   U32 doesn't affect on br_netfilter.c

Sorry, I don't know the ip_queue mechanism and I don't know what could
possibly go wrong.
All we know is that you no longer have kernel panics with the simple
patch I gave you (which just drops packets when a kernel panic would
happen otherwise, and tells about this with a printk). However, you
state there are no entries in your syslog that tell about this dropping.
Is your syslog working right? Do you have a console open on which kernel
messages get printed?

I still secretly suspect the snort code of inserting packets back into
the kernel that don't have an output device (I don't know if that's
possible, though).

cheers,
Bart



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: do_IRQ: stack overflow: 872..
  2004-12-18  7:50 ` do_IRQ: stack overflow: 872 Andi Kleen
  2004-12-18 11:12   ` Bart De Schuymer
@ 2005-01-07 17:05   ` David Woodhouse
  2005-01-07 18:00       ` [Bridge] " Stephen Hemminger
  1 sibling, 1 reply; 53+ messages in thread
From: David Woodhouse @ 2005-01-07 17:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Crazy AMD K7, linux-kernel, netdev, Stephen Hemminger

On Sat, 2004-12-18 at 08:50 +0100, Andi Kleen wrote:
> It's not really an oops, just a warning that stack space got quiet
> tight.
> 
> The problem seems to be that the br netfilter code is nesting far too
> deeply and recursing several times. Looks like a design bug to me,
> it shouldn't do that.

I don't think it's recursing -- I think the stack trace is just a bit
noisy. The problem is that the bridge code, especially with br_netfilter
in the equation, is implicated in code paths which are just _too_ deep.
This happens when you're bridging packets received in an interrupt while
you were deep in journalling code, and it's also been seen with a call
trace something like nfs->sunrpc->ip->bridge->br_netfilter.

One option might be to make br_dev_xmit() just queue the packet rather
than trying to deliver it to all the slave devices immediately. Then the
actual retransmission can be handled from a context where we're _not_
short of stack; perhaps from a dedicated kernel thread. 

Unfortunately that approach would introduce a lot of latency on all
packets we pass. Another option would be to have all architectures
provide a stack_available() function and for br_dev_xmit() to queue the
packet only if we're short of stack, while still sending most packets
immediately. 

Proof of concept below; obviously the stack_available() is an evil hack
and would need to be done more sanely. Comments?

===== net/bridge/br_device.c 1.17 vs edited =====
--- 1.17/net/bridge/br_device.c	2004-07-29 22:40:51 +01:00
+++ edited/net/bridge/br_device.c	2005-01-07 16:54:26 +00:00
@@ -19,6 +19,13 @@
 #include <asm/uaccess.h>
 #include "br_private.h"
 
+static inline unsigned long stack_available(void)
+{
+	unsigned long esp;
+	asm volatile("movl %%esp,%0" : "=r"(esp));
+	return esp - (unsigned long)current - sizeof(struct thread_info);
+}
+
 static struct net_device_stats *br_dev_get_stats(struct net_device *dev)
 {
 	struct net_bridge *br;
@@ -34,6 +41,14 @@
 	const unsigned char *dest = skb->data;
 	struct net_bridge_fdb_entry *dst;
 
+	if (stack_available() < THREAD_SIZE/2) {
+		if (net_ratelimit()) {
+			printk(KERN_DEBUG "Bridge device %s queues packet due to stack shortage\n",
+			       dev->name);
+		}
+		return NETDEV_TX_BUSY;
+	}
+
 	br->statistics.tx_packets++;
 	br->statistics.tx_bytes += skb->len;
 
@@ -104,7 +119,7 @@
 	SET_MODULE_OWNER(dev);
 	dev->stop = br_dev_stop;
 	dev->accept_fastpath = br_dev_accept_fastpath;
-	dev->tx_queue_len = 0;
+	dev->tx_queue_len = 5;
 	dev->set_mac_address = NULL;
 	dev->priv_flags = IFF_EBRIDGE;
 }




-- 
dwmw2



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: do_IRQ: stack overflow: 872..
  2005-01-07 17:05   ` David Woodhouse
@ 2005-01-07 18:00       ` Stephen Hemminger
  0 siblings, 0 replies; 53+ messages in thread
From: Stephen Hemminger @ 2005-01-07 18:00 UTC (permalink / raw)
  To: David Woodhouse, Bart De Schuymer
  Cc: Crazy AMD K7, bridge, Andi Kleen, netdev

[-- Attachment #1: Type: text/plain, Size: 1710 bytes --]

On Fri, 07 Jan 2005 17:05:59 +0000
David Woodhouse <dwmw2@infradead.org> wrote:

> On Sat, 2004-12-18 at 08:50 +0100, Andi Kleen wrote:
> > It's not really an oops, just a warning that stack space got quiet
> > tight.
> > 
> > The problem seems to be that the br netfilter code is nesting far too
> > deeply and recursing several times. Looks like a design bug to me,
> > it shouldn't do that.
> 
> I don't think it's recursing -- I think the stack trace is just a bit
> noisy. The problem is that the bridge code, especially with br_netfilter
> in the equation, is implicated in code paths which are just _too_ deep.
> This happens when you're bridging packets received in an interrupt while
> you were deep in journalling code, and it's also been seen with a call
> trace something like nfs->sunrpc->ip->bridge->br_netfilter.

Sounds like an argument for interrupt stacks.

> One option might be to make br_dev_xmit() just queue the packet rather
> than trying to deliver it to all the slave devices immediately. Then the
> actual retransmission can be handled from a context where we're _not_
> short of stack; perhaps from a dedicated kernel thread. 

Probably the solution would be to handle it in the filter code
that way if we are not filtering, we can use the interrupt path,
but if filtering just defer to a safer context (like soft irq).

> Unfortunately that approach would introduce a lot of latency on all
> packets we pass. Another option would be to have all architectures
> provide a stack_available() function and for br_dev_xmit() to queue the
> packet only if we're short of stack, while still sending most packets
> immediately. 

NO, that looks like a testablity and portablity nightmare.

[-- Attachment #2: Type: text/plain, Size: 140 bytes --]

_______________________________________________
Bridge mailing list
Bridge@lists.osdl.org
http://lists.osdl.org/mailman/listinfo/bridge

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: do_IRQ: stack overflow: 872..
@ 2005-01-07 18:00       ` Stephen Hemminger
  0 siblings, 0 replies; 53+ messages in thread
From: Stephen Hemminger @ 2005-01-07 18:00 UTC (permalink / raw)
  To: David Woodhouse, Bart De Schuymer
  Cc: Crazy AMD K7, bridge, Andi Kleen, netdev

On Fri, 07 Jan 2005 17:05:59 +0000
David Woodhouse <dwmw2@infradead.org> wrote:

> On Sat, 2004-12-18 at 08:50 +0100, Andi Kleen wrote:
> > It's not really an oops, just a warning that stack space got quiet
> > tight.
> > 
> > The problem seems to be that the br netfilter code is nesting far too
> > deeply and recursing several times. Looks like a design bug to me,
> > it shouldn't do that.
> 
> I don't think it's recursing -- I think the stack trace is just a bit
> noisy. The problem is that the bridge code, especially with br_netfilter
> in the equation, is implicated in code paths which are just _too_ deep.
> This happens when you're bridging packets received in an interrupt while
> you were deep in journalling code, and it's also been seen with a call
> trace something like nfs->sunrpc->ip->bridge->br_netfilter.

Sounds like an argument for interrupt stacks.

> One option might be to make br_dev_xmit() just queue the packet rather
> than trying to deliver it to all the slave devices immediately. Then the
> actual retransmission can be handled from a context where we're _not_
> short of stack; perhaps from a dedicated kernel thread. 

Probably the solution would be to handle it in the filter code
that way if we are not filtering, we can use the interrupt path,
but if filtering just defer to a safer context (like soft irq).

> Unfortunately that approach would introduce a lot of latency on all
> packets we pass. Another option would be to have all architectures
> provide a stack_available() function and for br_dev_xmit() to queue the
> packet only if we're short of stack, while still sending most packets
> immediately. 

NO, that looks like a testablity and portablity nightmare.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: do_IRQ: stack overflow: 872..
  2005-01-07 18:00       ` [Bridge] " Stephen Hemminger
@ 2005-01-07 18:06         ` David Woodhouse
  -1 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2005-01-07 18:06 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Bart De Schuymer, Andi Kleen, Crazy AMD K7, bridge, netdev

On Fri, 2005-01-07 at 10:00 -0800, Stephen Hemminger wrote:
> > I don't think it's recursing -- I think the stack trace is just a bit
> > noisy. The problem is that the bridge code, especially with br_netfilter
> > in the equation, is implicated in code paths which are just _too_ deep.
> > This happens when you're bridging packets received in an interrupt while
> > you were deep in journalling code, and it's also been seen with a call
> > trace something like nfs->sunrpc->ip->bridge->br_netfilter.
> 
> Sounds like an argument for interrupt stacks.

The NFS case didn't involve hardware interrupts. Except for the one
which actually detected that the stack had overflowed.

> Probably the solution would be to handle it in the filter code
> that way if we are not filtering, we can use the interrupt path,
> but if filtering just defer to a safer context (like soft irq).

That's also a possibility.

> > Unfortunately that approach would introduce a lot of latency on all
> > packets we pass. Another option would be to have all architectures
> > provide a stack_available() function and for br_dev_xmit() to queue the
> > packet only if we're short of stack, while still sending most packets
> > immediately. 
> 
> NO, that looks like a testablity and portablity nightmare.

Yeah, I suppose I'm inclined to agree.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: do_IRQ: stack overflow: 872..
@ 2005-01-07 18:06         ` David Woodhouse
  0 siblings, 0 replies; 53+ messages in thread
From: David Woodhouse @ 2005-01-07 18:06 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Crazy AMD K7, bridge, Bart De Schuymer, Andi Kleen, netdev

On Fri, 2005-01-07 at 10:00 -0800, Stephen Hemminger wrote:
> > I don't think it's recursing -- I think the stack trace is just a bit
> > noisy. The problem is that the bridge code, especially with br_netfilter
> > in the equation, is implicated in code paths which are just _too_ deep.
> > This happens when you're bridging packets received in an interrupt while
> > you were deep in journalling code, and it's also been seen with a call
> > trace something like nfs->sunrpc->ip->bridge->br_netfilter.
> 
> Sounds like an argument for interrupt stacks.

The NFS case didn't involve hardware interrupts. Except for the one
which actually detected that the stack had overflowed.

> Probably the solution would be to handle it in the filter code
> that way if we are not filtering, we can use the interrupt path,
> but if filtering just defer to a safer context (like soft irq).

That's also a possibility.

> > Unfortunately that approach would introduce a lot of latency on all
> > packets we pass. Another option would be to have all architectures
> > provide a stack_available() function and for br_dev_xmit() to queue the
> > packet only if we're short of stack, while still sending most packets
> > immediately. 
> 
> NO, that looks like a testablity and portablity nightmare.

Yeah, I suppose I'm inclined to agree.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: do_IRQ: stack overflow: 872..
  2005-01-07 18:00       ` [Bridge] " Stephen Hemminger
@ 2005-01-07 21:27         ` Bart De Schuymer
  -1 siblings, 0 replies; 53+ messages in thread
From: Bart De Schuymer @ 2005-01-07 21:27 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Woodhouse, Andi Kleen, Crazy AMD K7, bridge, netdev,
	Rusty Russell, David S. Miller

Op vr, 07-01-2005 te 10:00 -0800, schreef Stephen Hemminger:
> On Fri, 07 Jan 2005 17:05:59 +0000
> David Woodhouse <dwmw2@infradead.org> wrote:
> > I don't think it's recursing -- I think the stack trace is just a bit
> > noisy. The problem is that the bridge code, especially with br_netfilter
> > in the equation, is implicated in code paths which are just _too_ deep.
> > This happens when you're bridging packets received in an interrupt while
> > you were deep in journalling code, and it's also been seen with a call
> > trace something like nfs->sunrpc->ip->bridge->br_netfilter.
> 
> Sounds like an argument for interrupt stacks.
> 
> > One option might be to make br_dev_xmit() just queue the packet rather
> > than trying to deliver it to all the slave devices immediately. Then the
> > actual retransmission can be handled from a context where we're _not_
> > short of stack; perhaps from a dedicated kernel thread. 
> 
> Probably the solution would be to handle it in the filter code
> that way if we are not filtering, we can use the interrupt path,
> but if filtering just defer to a safer context (like soft irq).

How about something like the patch below (untested but compiles)?
The current netfilter scheme adds one function call to the call chain
for each NF_HOOK and NF_HOOK_THRESH. This can be prevented by executing
the okfn in the calling function instead of in nf_hook_slow().
I didn't check if there's any code that actually uses the return value
from NF_HOOK. If so, this patch won't work well in its current form as -
EPERM is now also returned for NF_QUEUE and NF_STOLEN.

Another 2 calls of okfn can be postponed in br_netfilter.c by adding
NF_STOP, which would work like NF_STOLEN except that okfn is still
called. But I'd first like to get the IPv4/IPv6 fix for br_netfilter.c
accepted (see another thread on netdev).

cheers,
Bart

--- linux-2.6.10/include/linux/netfilter.h.old	2005-01-07 20:41:30.000000000 +0100
+++ linux-2.6.10/include/linux/netfilter.h	2005-01-07 22:04:57.096626176 +0100
@@ -138,21 +138,32 @@ void nf_log_packet(int pf,
 /* This is gross, but inline doesn't cut it for avoiding the function
    call in fast path: gcc doesn't inline (needs value tracking?). --RR */
 #ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
- nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
-#define NF_HOOK_THRESH nf_hook_slow
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			    \
+({int __ret = nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN); \
+if (!__ret)								    \
+	__ret = (okfn)(skb);						    \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	    \
+({int __ret = nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh);  \
+if (!__ret)								    \
+	__ret = (okfn)(skb);						    \
+__ret;})
 #else
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
-#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh)))
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !(__ret = nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN))) \
+	__ret = (okfn)(skb);						     \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !(__ret = nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh)))  \
+	__ret = (okfn)(skb);						     \
+__ret;})
 #endif
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev, struct net_device *outdev,
 		 int (*okfn)(struct sk_buff *), int thresh);
 
--- linux-2.6.10/net/core/netfilter.c.old	2005-01-07 20:41:47.000000000 +0100
+++ linux-2.6.10/net/core/netfilter.c	2005-01-07 21:52:13.659686232 +0100
@@ -494,7 +494,7 @@ static int nf_queue(struct sk_buff *skb,
 	return 1;
 }
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev,
 		 struct net_device *outdev,
 		 int (*okfn)(struct sk_buff *),
@@ -508,30 +508,28 @@ int nf_hook_slow(int pf, unsigned int ho
 	rcu_read_lock();
 
 #ifdef CONFIG_NETFILTER_DEBUG
-	if (skb->nf_debug & (1 << hook)) {
+	if ((*pskb)->nf_debug & (1 << hook)) {
 		printk("nf_hook: hook %i already set.\n", hook);
-		nf_dump_skb(pf, skb);
+		nf_dump_skb(pf, *pskb);
 	}
 	skb->nf_debug |= (1 << hook);
 #endif
 
 	elem = &nf_hooks[pf][hook];
  next_hook:
-	verdict = nf_iterate(&nf_hooks[pf][hook], &skb, hook, indev,
+	verdict = nf_iterate(&nf_hooks[pf][hook], pskb, hook, indev,
 			     outdev, &elem, okfn, hook_thresh);
 	if (verdict == NF_QUEUE) {
 		NFDEBUG("nf_hook: Verdict = QUEUE.\n");
-		if (!nf_queue(skb, elem, pf, hook, indev, outdev, okfn))
+		if (!nf_queue(*pskb, elem, pf, hook, indev, outdev, okfn))
 			goto next_hook;
 	}
 
 	switch (verdict) {
-	case NF_ACCEPT:
-		ret = okfn(skb);
-		break;
-
 	case NF_DROP:
-		kfree_skb(skb);
+		kfree_skb(*pskb);
+	case NF_STOLEN:
+	case NF_QUEUE:
 		ret = -EPERM;
 		break;
 	}

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: do_IRQ: stack overflow: 872..
@ 2005-01-07 21:27         ` Bart De Schuymer
  0 siblings, 0 replies; 53+ messages in thread
From: Bart De Schuymer @ 2005-01-07 21:27 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, bridge, Crazy AMD K7, Rusty Russell, David Woodhouse,
	Andi Kleen, David S. Miller

Op vr, 07-01-2005 te 10:00 -0800, schreef Stephen Hemminger:
> On Fri, 07 Jan 2005 17:05:59 +0000
> David Woodhouse <dwmw2@infradead.org> wrote:
> > I don't think it's recursing -- I think the stack trace is just a bit
> > noisy. The problem is that the bridge code, especially with br_netfilter
> > in the equation, is implicated in code paths which are just _too_ deep.
> > This happens when you're bridging packets received in an interrupt while
> > you were deep in journalling code, and it's also been seen with a call
> > trace something like nfs->sunrpc->ip->bridge->br_netfilter.
> 
> Sounds like an argument for interrupt stacks.
> 
> > One option might be to make br_dev_xmit() just queue the packet rather
> > than trying to deliver it to all the slave devices immediately. Then the
> > actual retransmission can be handled from a context where we're _not_
> > short of stack; perhaps from a dedicated kernel thread. 
> 
> Probably the solution would be to handle it in the filter code
> that way if we are not filtering, we can use the interrupt path,
> but if filtering just defer to a safer context (like soft irq).

How about something like the patch below (untested but compiles)?
The current netfilter scheme adds one function call to the call chain
for each NF_HOOK and NF_HOOK_THRESH. This can be prevented by executing
the okfn in the calling function instead of in nf_hook_slow().
I didn't check if there's any code that actually uses the return value
from NF_HOOK. If so, this patch won't work well in its current form as -
EPERM is now also returned for NF_QUEUE and NF_STOLEN.

Another 2 calls of okfn can be postponed in br_netfilter.c by adding
NF_STOP, which would work like NF_STOLEN except that okfn is still
called. But I'd first like to get the IPv4/IPv6 fix for br_netfilter.c
accepted (see another thread on netdev).

cheers,
Bart

--- linux-2.6.10/include/linux/netfilter.h.old	2005-01-07 20:41:30.000000000 +0100
+++ linux-2.6.10/include/linux/netfilter.h	2005-01-07 22:04:57.096626176 +0100
@@ -138,21 +138,32 @@ void nf_log_packet(int pf,
 /* This is gross, but inline doesn't cut it for avoiding the function
    call in fast path: gcc doesn't inline (needs value tracking?). --RR */
 #ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
- nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
-#define NF_HOOK_THRESH nf_hook_slow
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			    \
+({int __ret = nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN); \
+if (!__ret)								    \
+	__ret = (okfn)(skb);						    \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	    \
+({int __ret = nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh);  \
+if (!__ret)								    \
+	__ret = (okfn)(skb);						    \
+__ret;})
 #else
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
-#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh)))
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !(__ret = nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN))) \
+	__ret = (okfn)(skb);						     \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !(__ret = nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh)))  \
+	__ret = (okfn)(skb);						     \
+__ret;})
 #endif
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev, struct net_device *outdev,
 		 int (*okfn)(struct sk_buff *), int thresh);
 
--- linux-2.6.10/net/core/netfilter.c.old	2005-01-07 20:41:47.000000000 +0100
+++ linux-2.6.10/net/core/netfilter.c	2005-01-07 21:52:13.659686232 +0100
@@ -494,7 +494,7 @@ static int nf_queue(struct sk_buff *skb,
 	return 1;
 }
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev,
 		 struct net_device *outdev,
 		 int (*okfn)(struct sk_buff *),
@@ -508,30 +508,28 @@ int nf_hook_slow(int pf, unsigned int ho
 	rcu_read_lock();
 
 #ifdef CONFIG_NETFILTER_DEBUG
-	if (skb->nf_debug & (1 << hook)) {
+	if ((*pskb)->nf_debug & (1 << hook)) {
 		printk("nf_hook: hook %i already set.\n", hook);
-		nf_dump_skb(pf, skb);
+		nf_dump_skb(pf, *pskb);
 	}
 	skb->nf_debug |= (1 << hook);
 #endif
 
 	elem = &nf_hooks[pf][hook];
  next_hook:
-	verdict = nf_iterate(&nf_hooks[pf][hook], &skb, hook, indev,
+	verdict = nf_iterate(&nf_hooks[pf][hook], pskb, hook, indev,
 			     outdev, &elem, okfn, hook_thresh);
 	if (verdict == NF_QUEUE) {
 		NFDEBUG("nf_hook: Verdict = QUEUE.\n");
-		if (!nf_queue(skb, elem, pf, hook, indev, outdev, okfn))
+		if (!nf_queue(*pskb, elem, pf, hook, indev, outdev, okfn))
 			goto next_hook;
 	}
 
 	switch (verdict) {
-	case NF_ACCEPT:
-		ret = okfn(skb);
-		break;
-
 	case NF_DROP:
-		kfree_skb(skb);
+		kfree_skb(*pskb);
+	case NF_STOLEN:
+	case NF_QUEUE:
 		ret = -EPERM;
 		break;
 	}




^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: do_IRQ: stack overflow: 872..
  2005-01-07 21:27         ` [Bridge] " Bart De Schuymer
@ 2005-01-18 21:57           ` David S. Miller
  -1 siblings, 0 replies; 53+ messages in thread
From: David S. Miller @ 2005-01-18 21:57 UTC (permalink / raw)
  To: Bart De Schuymer; +Cc: shemminger, dwmw2, ak, snort2004, bridge, netdev, rusty

On Fri, 07 Jan 2005 22:27:21 +0100
Bart De Schuymer <bdschuym@pandora.be> wrote:

> How about something like the patch below (untested but compiles)?
> The current netfilter scheme adds one function call to the call chain
> for each NF_HOOK and NF_HOOK_THRESH. This can be prevented by executing
> the okfn in the calling function instead of in nf_hook_slow().
> I didn't check if there's any code that actually uses the return value
> from NF_HOOK. If so, this patch won't work well in its current form as -
> EPERM is now also returned for NF_QUEUE and NF_STOLEN.
> 
> Another 2 calls of okfn can be postponed in br_netfilter.c by adding
> NF_STOP, which would work like NF_STOLEN except that okfn is still
> called. But I'd first like to get the IPv4/IPv6 fix for br_netfilter.c
> accepted (see another thread on netdev).

I believe I put in your ipv4/ipv6 br_netfilter fix already.

This NF_HOOK() change looks interesting.  Could we also do something like
running the deeper ->hard_start_xmit() via a triggered tasklet or something
similar?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: do_IRQ: stack overflow: 872..
@ 2005-01-18 21:57           ` David S. Miller
  0 siblings, 0 replies; 53+ messages in thread
From: David S. Miller @ 2005-01-18 21:57 UTC (permalink / raw)
  To: Bart De Schuymer; +Cc: netdev, bridge, snort2004, rusty, dwmw2, ak, shemminger

On Fri, 07 Jan 2005 22:27:21 +0100
Bart De Schuymer <bdschuym@pandora.be> wrote:

> How about something like the patch below (untested but compiles)?
> The current netfilter scheme adds one function call to the call chain
> for each NF_HOOK and NF_HOOK_THRESH. This can be prevented by executing
> the okfn in the calling function instead of in nf_hook_slow().
> I didn't check if there's any code that actually uses the return value
> from NF_HOOK. If so, this patch won't work well in its current form as -
> EPERM is now also returned for NF_QUEUE and NF_STOLEN.
> 
> Another 2 calls of okfn can be postponed in br_netfilter.c by adding
> NF_STOP, which would work like NF_STOLEN except that okfn is still
> called. But I'd first like to get the IPv4/IPv6 fix for br_netfilter.c
> accepted (see another thread on netdev).

I believe I put in your ipv4/ipv6 br_netfilter fix already.

This NF_HOOK() change looks interesting.  Could we also do something like
running the deeper ->hard_start_xmit() via a triggered tasklet or something
similar?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH/RFC] Reduce call chain length in netfilter (was: Re: do_IRQ: stack overflow: 872..)
  2005-01-18 21:57           ` [Bridge] " David S. Miller
@ 2005-01-22 22:30             ` Bart De Schuymer
  -1 siblings, 0 replies; 53+ messages in thread
From: Bart De Schuymer @ 2005-01-22 22:30 UTC (permalink / raw)
  To: David S. Miller; +Cc: shemminger, dwmw2, ak, snort2004, bridge, netdev, rusty

Op di, 18-01-2005 te 13:57 -0800, schreef David S. Miller:
> On Fri, 07 Jan 2005 22:27:21 +0100
> Bart De Schuymer <bdschuym@pandora.be> wrote:
> 
> > How about something like the patch below (untested but compiles)?
> > The current netfilter scheme adds one function call to the call chain
> > for each NF_HOOK and NF_HOOK_THRESH. This can be prevented by executing
> > the okfn in the calling function instead of in nf_hook_slow().
> > I didn't check if there's any code that actually uses the return value
> > from NF_HOOK. If so, this patch won't work well in its current form as -
> > EPERM is now also returned for NF_QUEUE and NF_STOLEN.
> > 
> > Another 2 calls of okfn can be postponed in br_netfilter.c by adding
> > NF_STOP, which would work like NF_STOLEN except that okfn is still
> > called.
> This NF_HOOK() change looks interesting.  Could we also do something like
> running the deeper ->hard_start_xmit() via a triggered tasklet or something
> similar?

Hi,

The patch below reduces the call chain length for netfilter hooks by
executing the okfn() inside the caller function instead of inside
nf_hook_slow(). It also reduces the size of netfilter.o from 129590 to
129550 on my system.
The return value for NF_HOOK stays the same as before: 0 except when the
verdict was NF_DROP (in that case the return value becomes -EPERM).
The target NF_STOP is added to postpone two executions of okfn() in
br_netfilter.c, it works like NF_STOLEN except that the okfn() will
still be called.
The downside is that all .o files using NF_HOOK will enlarge.
Comments are welcome.

cheers,
Bart

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>

--- linux-2.6.1-rc1/include/linux/netfilter.h.old	2005-01-20 21:44:22.000000000 +0100
+++ linux-2.6.1-rc1/include/linux/netfilter.h	2005-01-22 22:06:14.000000000 +0100
@@ -18,7 +18,8 @@
 #define NF_STOLEN 2
 #define NF_QUEUE 3
 #define NF_REPEAT 4
-#define NF_MAX_VERDICT NF_REPEAT
+#define NF_STOP 5
+#define NF_MAX_VERDICT NF_STOP
 
 /* Generic cache responses from hook functions.
    <= 0x2000 is used for protocol-flags. */
@@ -138,23 +139,34 @@ void nf_log_packet(int pf,
 /* This is gross, but inline doesn't cut it for avoiding the function
    call in fast path: gcc doesn't inline (needs value tracking?). --RR */
 #ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
- nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
-#define NF_HOOK_THRESH nf_hook_slow
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			    \
+({int __ret = 0;							    \
+if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret);  \
+	__ret = (okfn)(skb);						    \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	    \
+({int __ret = 0;							    \
+if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret);   \
+	__ret = (okfn)(skb);						    \
+__ret;})
 #else
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
-#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh)))
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret))   \
+	__ret = (okfn)(skb);						     \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret))    \
+	__ret = (okfn)(skb);						     \
+__ret;})
 #endif
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev, struct net_device *outdev,
-		 int (*okfn)(struct sk_buff *), int thresh);
+		 int (*okfn)(struct sk_buff *), int thresh, int *ret);
 
 /* Call setsockopt() */
 int nf_setsockopt(struct sock *sk, int pf, int optval, char __user *opt, 
--- linux-2.6.1-rc1/net/core/netfilter.c.old	2005-01-20 21:38:59.000000000 +0100
+++ linux-2.6.1-rc1/net/core/netfilter.c	2005-01-22 22:08:41.000000000 +0100
@@ -349,6 +349,8 @@ static unsigned int nf_iterate(struct li
 			       int (*okfn)(struct sk_buff *),
 			       int hook_thresh)
 {
+	unsigned int verdict;
+
 	/*
 	 * The caller must not block between calls to this
 	 * function because of risk of continuing from deleted element.
@@ -361,28 +363,18 @@ static unsigned int nf_iterate(struct li
 
 		/* Optimization: we don't need to hold module
                    reference here, since function can't sleep. --RR */
-		switch (elem->hook(hook, skb, indev, outdev, okfn)) {
-		case NF_QUEUE:
-			return NF_QUEUE;
-
-		case NF_STOLEN:
-			return NF_STOLEN;
-
-		case NF_DROP:
-			return NF_DROP;
-
-		case NF_REPEAT:
-			*i = (*i)->prev;
-			break;
-
+		verdict = elem->hook(hook, skb, indev, outdev, okfn);
+		if (verdict != NF_ACCEPT) {
 #ifdef CONFIG_NETFILTER_DEBUG
-		case NF_ACCEPT:
-			break;
-
-		default:
-			NFDEBUG("Evil return from %p(%u).\n", 
-				elem->hook, hook);
+			if (verdict > NF_MAX_VERDICT) {
+				NFDEBUG("Evil return from %p(%u).\n",
+				        elem->hook, hook);
+				continue;
+			}
 #endif
+			if (verdict != NF_REPEAT)
+				return verdict;
+			*i = (*i)->prev;
 		}
 	}
 	return NF_ACCEPT;
@@ -494,50 +486,47 @@ static int nf_queue(struct sk_buff *skb,
 	return 1;
 }
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+/* Returns 0 if okfn() needs to be executed by the caller, -EPERM otherwise.
+ * Assumes *ret==0 when called. On return, *ret!=0 when verdict==NF_DROP */
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev,
 		 struct net_device *outdev,
 		 int (*okfn)(struct sk_buff *),
-		 int hook_thresh)
+		 int hook_thresh, int *ret)
 {
 	struct list_head *elem;
 	unsigned int verdict;
-	int ret = 0;
+	int ret2 = 0;
 
 	/* We may already have this, but read-locks nest anyway */
 	rcu_read_lock();
 
 #ifdef CONFIG_NETFILTER_DEBUG
-	if (skb->nf_debug & (1 << hook)) {
+	if ((*pskb)->nf_debug & (1 << hook)) {
 		printk("nf_hook: hook %i already set.\n", hook);
-		nf_dump_skb(pf, skb);
+		nf_dump_skb(pf, *pskb);
 	}
 	skb->nf_debug |= (1 << hook);
 #endif
 
 	elem = &nf_hooks[pf][hook];
  next_hook:
-	verdict = nf_iterate(&nf_hooks[pf][hook], &skb, hook, indev,
+	verdict = nf_iterate(&nf_hooks[pf][hook], pskb, hook, indev,
 			     outdev, &elem, okfn, hook_thresh);
-	if (verdict == NF_QUEUE) {
+	if (verdict == NF_ACCEPT || verdict == NF_STOP)
+		goto unlock;
+	else if (verdict == NF_DROP) {
+		kfree_skb(*pskb);
+		*ret = -EPERM;
+	} else if (verdict == NF_QUEUE) {
 		NFDEBUG("nf_hook: Verdict = QUEUE.\n");
-		if (!nf_queue(skb, elem, pf, hook, indev, outdev, okfn))
+		if (!nf_queue(*pskb, elem, pf, hook, indev, outdev, okfn))
 			goto next_hook;
 	}
-
-	switch (verdict) {
-	case NF_ACCEPT:
-		ret = okfn(skb);
-		break;
-
-	case NF_DROP:
-		kfree_skb(skb);
-		ret = -EPERM;
-		break;
-	}
-
+	ret2 = -EPERM;
+unlock:
 	rcu_read_unlock();
-	return ret;
+	return ret2;
 }
 
 void nf_reinject(struct sk_buff *skb, struct nf_info *info,
--- linux-2.6.1-rc1/net/bridge/br_netfilter.c.old	2005-01-20 21:43:08.000000000 +0100
+++ linux-2.6.1-rc1/net/bridge/br_netfilter.c	2005-01-22 21:20:06.000000000 +0100
@@ -829,8 +829,7 @@ static unsigned int ip_sabotage_in(unsig
 {
 	if ((*pskb)->nf_bridge &&
 	    !((*pskb)->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING)) {
-		okfn(*pskb);
-		return NF_STOLEN;
+		return NF_STOP;
 	}
 
 	return NF_ACCEPT;
@@ -888,8 +887,7 @@ static unsigned int ip_sabotage_out(unsi
 		if (out->priv_flags & IFF_802_1Q_VLAN)
 			nf_bridge->netoutdev = (struct net_device *)out;
 #endif
-		okfn(skb);
-		return NF_STOLEN;
+		return NF_STOP;
 	}
 
 	return NF_ACCEPT;

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] [PATCH/RFC] Reduce call chain length in netfilter (was: Re: do_IRQ: stack overflow: 872..)
@ 2005-01-22 22:30             ` Bart De Schuymer
  0 siblings, 0 replies; 53+ messages in thread
From: Bart De Schuymer @ 2005-01-22 22:30 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, bridge, snort2004, rusty, dwmw2, ak, shemminger

Op di, 18-01-2005 te 13:57 -0800, schreef David S. Miller:
> On Fri, 07 Jan 2005 22:27:21 +0100
> Bart De Schuymer <bdschuym@pandora.be> wrote:
> 
> > How about something like the patch below (untested but compiles)?
> > The current netfilter scheme adds one function call to the call chain
> > for each NF_HOOK and NF_HOOK_THRESH. This can be prevented by executing
> > the okfn in the calling function instead of in nf_hook_slow().
> > I didn't check if there's any code that actually uses the return value
> > from NF_HOOK. If so, this patch won't work well in its current form as -
> > EPERM is now also returned for NF_QUEUE and NF_STOLEN.
> > 
> > Another 2 calls of okfn can be postponed in br_netfilter.c by adding
> > NF_STOP, which would work like NF_STOLEN except that okfn is still
> > called.
> This NF_HOOK() change looks interesting.  Could we also do something like
> running the deeper ->hard_start_xmit() via a triggered tasklet or something
> similar?

Hi,

The patch below reduces the call chain length for netfilter hooks by
executing the okfn() inside the caller function instead of inside
nf_hook_slow(). It also reduces the size of netfilter.o from 129590 to
129550 on my system.
The return value for NF_HOOK stays the same as before: 0 except when the
verdict was NF_DROP (in that case the return value becomes -EPERM).
The target NF_STOP is added to postpone two executions of okfn() in
br_netfilter.c, it works like NF_STOLEN except that the okfn() will
still be called.
The downside is that all .o files using NF_HOOK will enlarge.
Comments are welcome.

cheers,
Bart

Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>

--- linux-2.6.1-rc1/include/linux/netfilter.h.old	2005-01-20 21:44:22.000000000 +0100
+++ linux-2.6.1-rc1/include/linux/netfilter.h	2005-01-22 22:06:14.000000000 +0100
@@ -18,7 +18,8 @@
 #define NF_STOLEN 2
 #define NF_QUEUE 3
 #define NF_REPEAT 4
-#define NF_MAX_VERDICT NF_REPEAT
+#define NF_STOP 5
+#define NF_MAX_VERDICT NF_STOP
 
 /* Generic cache responses from hook functions.
    <= 0x2000 is used for protocol-flags. */
@@ -138,23 +139,34 @@ void nf_log_packet(int pf,
 /* This is gross, but inline doesn't cut it for avoiding the function
    call in fast path: gcc doesn't inline (needs value tracking?). --RR */
 #ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
- nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
-#define NF_HOOK_THRESH nf_hook_slow
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			    \
+({int __ret = 0;							    \
+if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret);  \
+	__ret = (okfn)(skb);						    \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	    \
+({int __ret = 0;							    \
+if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret);   \
+	__ret = (okfn)(skb);						    \
+__ret;})
 #else
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
-#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh)))
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret))   \
+	__ret = (okfn)(skb);						     \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret))    \
+	__ret = (okfn)(skb);						     \
+__ret;})
 #endif
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev, struct net_device *outdev,
-		 int (*okfn)(struct sk_buff *), int thresh);
+		 int (*okfn)(struct sk_buff *), int thresh, int *ret);
 
 /* Call setsockopt() */
 int nf_setsockopt(struct sock *sk, int pf, int optval, char __user *opt, 
--- linux-2.6.1-rc1/net/core/netfilter.c.old	2005-01-20 21:38:59.000000000 +0100
+++ linux-2.6.1-rc1/net/core/netfilter.c	2005-01-22 22:08:41.000000000 +0100
@@ -349,6 +349,8 @@ static unsigned int nf_iterate(struct li
 			       int (*okfn)(struct sk_buff *),
 			       int hook_thresh)
 {
+	unsigned int verdict;
+
 	/*
 	 * The caller must not block between calls to this
 	 * function because of risk of continuing from deleted element.
@@ -361,28 +363,18 @@ static unsigned int nf_iterate(struct li
 
 		/* Optimization: we don't need to hold module
                    reference here, since function can't sleep. --RR */
-		switch (elem->hook(hook, skb, indev, outdev, okfn)) {
-		case NF_QUEUE:
-			return NF_QUEUE;
-
-		case NF_STOLEN:
-			return NF_STOLEN;
-
-		case NF_DROP:
-			return NF_DROP;
-
-		case NF_REPEAT:
-			*i = (*i)->prev;
-			break;
-
+		verdict = elem->hook(hook, skb, indev, outdev, okfn);
+		if (verdict != NF_ACCEPT) {
 #ifdef CONFIG_NETFILTER_DEBUG
-		case NF_ACCEPT:
-			break;
-
-		default:
-			NFDEBUG("Evil return from %p(%u).\n", 
-				elem->hook, hook);
+			if (verdict > NF_MAX_VERDICT) {
+				NFDEBUG("Evil return from %p(%u).\n",
+				        elem->hook, hook);
+				continue;
+			}
 #endif
+			if (verdict != NF_REPEAT)
+				return verdict;
+			*i = (*i)->prev;
 		}
 	}
 	return NF_ACCEPT;
@@ -494,50 +486,47 @@ static int nf_queue(struct sk_buff *skb,
 	return 1;
 }
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+/* Returns 0 if okfn() needs to be executed by the caller, -EPERM otherwise.
+ * Assumes *ret==0 when called. On return, *ret!=0 when verdict==NF_DROP */
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev,
 		 struct net_device *outdev,
 		 int (*okfn)(struct sk_buff *),
-		 int hook_thresh)
+		 int hook_thresh, int *ret)
 {
 	struct list_head *elem;
 	unsigned int verdict;
-	int ret = 0;
+	int ret2 = 0;
 
 	/* We may already have this, but read-locks nest anyway */
 	rcu_read_lock();
 
 #ifdef CONFIG_NETFILTER_DEBUG
-	if (skb->nf_debug & (1 << hook)) {
+	if ((*pskb)->nf_debug & (1 << hook)) {
 		printk("nf_hook: hook %i already set.\n", hook);
-		nf_dump_skb(pf, skb);
+		nf_dump_skb(pf, *pskb);
 	}
 	skb->nf_debug |= (1 << hook);
 #endif
 
 	elem = &nf_hooks[pf][hook];
  next_hook:
-	verdict = nf_iterate(&nf_hooks[pf][hook], &skb, hook, indev,
+	verdict = nf_iterate(&nf_hooks[pf][hook], pskb, hook, indev,
 			     outdev, &elem, okfn, hook_thresh);
-	if (verdict == NF_QUEUE) {
+	if (verdict == NF_ACCEPT || verdict == NF_STOP)
+		goto unlock;
+	else if (verdict == NF_DROP) {
+		kfree_skb(*pskb);
+		*ret = -EPERM;
+	} else if (verdict == NF_QUEUE) {
 		NFDEBUG("nf_hook: Verdict = QUEUE.\n");
-		if (!nf_queue(skb, elem, pf, hook, indev, outdev, okfn))
+		if (!nf_queue(*pskb, elem, pf, hook, indev, outdev, okfn))
 			goto next_hook;
 	}
-
-	switch (verdict) {
-	case NF_ACCEPT:
-		ret = okfn(skb);
-		break;
-
-	case NF_DROP:
-		kfree_skb(skb);
-		ret = -EPERM;
-		break;
-	}
-
+	ret2 = -EPERM;
+unlock:
 	rcu_read_unlock();
-	return ret;
+	return ret2;
 }
 
 void nf_reinject(struct sk_buff *skb, struct nf_info *info,
--- linux-2.6.1-rc1/net/bridge/br_netfilter.c.old	2005-01-20 21:43:08.000000000 +0100
+++ linux-2.6.1-rc1/net/bridge/br_netfilter.c	2005-01-22 21:20:06.000000000 +0100
@@ -829,8 +829,7 @@ static unsigned int ip_sabotage_in(unsig
 {
 	if ((*pskb)->nf_bridge &&
 	    !((*pskb)->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING)) {
-		okfn(*pskb);
-		return NF_STOLEN;
+		return NF_STOP;
 	}
 
 	return NF_ACCEPT;
@@ -888,8 +887,7 @@ static unsigned int ip_sabotage_out(unsi
 		if (out->priv_flags & IFF_802_1Q_VLAN)
 			nf_bridge->netoutdev = (struct net_device *)out;
 #endif
-		okfn(skb);
-		return NF_STOLEN;
+		return NF_STOP;
 	}
 
 	return NF_ACCEPT;




^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter (was: Re: do_IRQ: stack overflow: 872..)
  2005-01-22 22:30             ` [Bridge] " Bart De Schuymer
@ 2005-01-22 23:22               ` Martin Josefsson
  -1 siblings, 0 replies; 53+ messages in thread
From: Martin Josefsson @ 2005-01-22 23:22 UTC (permalink / raw)
  To: Bart De Schuymer
  Cc: David S. Miller, shemminger, dwmw2, ak, snort2004, bridge,
	netdev, Rusty Russell

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]

On Sat, 2005-01-22 at 23:30 +0100, Bart De Schuymer wrote:

Hi Bart

> @@ -138,23 +139,34 @@ void nf_log_packet(int pf,
>  /* This is gross, but inline doesn't cut it for avoiding the function
>     call in fast path: gcc doesn't inline (needs value tracking?). --RR */
>  #ifdef CONFIG_NETFILTER_DEBUG
> -#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
> - nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
> -#define NF_HOOK_THRESH nf_hook_slow
> +#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			    \
> +({int __ret = 0;							    \
> +if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret);  \
> +	__ret = (okfn)(skb);						    \
> +__ret;})
> +#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	    \
> +({int __ret = 0;							    \
> +if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret);   \
> +	__ret = (okfn)(skb);						    \
> +__ret;})
>  #else

I guess you never testcompiled with CONFIG_NETFILTER_DEBUG set :)
The if-statements above needs to have ; replaced with )

> +			if (verdict > NF_MAX_VERDICT) {
> +				NFDEBUG("Evil return from %p(%u).\n",
> +				        elem->hook, hook);
> +				continue;
> +			}

Maybe add unlikely() around the test?

Otherwise the changes look sane.

The reoganisation of things in nf_hook_slow() shouldn't cause any
performance changes, I tried to benchmark various variations of that
code some time ago but the result of the changes were more or less in
the noise.

-- 
/Martin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter (was: Re: do_IRQ: stack overflow: 872..)
@ 2005-01-22 23:22               ` Martin Josefsson
  0 siblings, 0 replies; 53+ messages in thread
From: Martin Josefsson @ 2005-01-22 23:22 UTC (permalink / raw)
  To: Bart De Schuymer
  Cc: netdev, bridge, snort2004, Rusty Russell, dwmw2, David S. Miller,
	shemminger, ak

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]

On Sat, 2005-01-22 at 23:30 +0100, Bart De Schuymer wrote:

Hi Bart

> @@ -138,23 +139,34 @@ void nf_log_packet(int pf,
>  /* This is gross, but inline doesn't cut it for avoiding the function
>     call in fast path: gcc doesn't inline (needs value tracking?). --RR */
>  #ifdef CONFIG_NETFILTER_DEBUG
> -#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
> - nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
> -#define NF_HOOK_THRESH nf_hook_slow
> +#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			    \
> +({int __ret = 0;							    \
> +if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret);  \
> +	__ret = (okfn)(skb);						    \
> +__ret;})
> +#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	    \
> +({int __ret = 0;							    \
> +if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret);   \
> +	__ret = (okfn)(skb);						    \
> +__ret;})
>  #else

I guess you never testcompiled with CONFIG_NETFILTER_DEBUG set :)
The if-statements above needs to have ; replaced with )

> +			if (verdict > NF_MAX_VERDICT) {
> +				NFDEBUG("Evil return from %p(%u).\n",
> +				        elem->hook, hook);
> +				continue;
> +			}

Maybe add unlikely() around the test?

Otherwise the changes look sane.

The reoganisation of things in nf_hook_slow() shouldn't cause any
performance changes, I tried to benchmark various variations of that
code some time ago but the result of the changes were more or less in
the noise.

-- 
/Martin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter (was: Re: do_IRQ: stack overflow: 872..)
  2005-01-22 23:22               ` [Bridge] " Martin Josefsson
@ 2005-01-23 12:40                 ` Bart De Schuymer
  -1 siblings, 0 replies; 53+ messages in thread
From: Bart De Schuymer @ 2005-01-23 12:40 UTC (permalink / raw)
  To: Martin Josefsson
  Cc: David S. Miller, shemminger, dwmw2, ak, snort2004, bridge,
	netdev, Rusty Russell

Op zo, 23-01-2005 te 00:22 +0100, schreef Martin Josefsson:
> On Sat, 2005-01-22 at 23:30 +0100, Bart De Schuymer wrote:
> 
> Hi Bart

Hi Martin,

> I guess you never testcompiled with CONFIG_NETFILTER_DEBUG set :)

Oops, no. I was imagining I could code that without error...
There was another compile error with CONFIG_NETFILTER_DEBUG in the patch
(skb that should have been (*pskb)) :(

> > +			if (verdict > NF_MAX_VERDICT) {
> > +				NFDEBUG("Evil return from %p(%u).\n",
> > +				        elem->hook, hook);
> > +				continue;
> > +			}
> 
> Maybe add unlikely() around the test?

OK. I added another one around another debugging if-statement.

cheers,
Bart

--- linux-2.6.11-rc1/include/linux/netfilter.h.old	2005-01-23 13:31:58.895886808 +0100
+++ linux-2.6.11-rc1/include/linux/netfilter.h	2005-01-23 13:32:02.853285192 +0100
@@ -18,7 +18,8 @@
 #define NF_STOLEN 2
 #define NF_QUEUE 3
 #define NF_REPEAT 4
-#define NF_MAX_VERDICT NF_REPEAT
+#define NF_STOP 5
+#define NF_MAX_VERDICT NF_STOP
 
 /* Generic cache responses from hook functions.
    <= 0x2000 is used for protocol-flags. */
@@ -138,23 +139,34 @@ void nf_log_packet(int pf,
 /* This is gross, but inline doesn't cut it for avoiding the function
    call in fast path: gcc doesn't inline (needs value tracking?). --RR */
 #ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
- nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
-#define NF_HOOK_THRESH nf_hook_slow
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			    \
+({int __ret = 0;							    \
+if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret))  \
+	__ret = (okfn)(skb);						    \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	    \
+({int __ret = 0;							    \
+if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret))   \
+	__ret = (okfn)(skb);						    \
+__ret;})
 #else
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
-#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh)))
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret))   \
+	__ret = (okfn)(skb);						     \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret))    \
+	__ret = (okfn)(skb);						     \
+__ret;})
 #endif
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev, struct net_device *outdev,
-		 int (*okfn)(struct sk_buff *), int thresh);
+		 int (*okfn)(struct sk_buff *), int thresh, int *ret);
 
 /* Call setsockopt() */
 int nf_setsockopt(struct sock *sk, int pf, int optval, char __user *opt, 
--- linux-2.6.11-rc1/net/core/netfilter.c.old	2005-01-23 13:31:48.980394192 +0100
+++ linux-2.6.11-rc1/net/core/netfilter.c	2005-01-23 13:32:02.856284736 +0100
@@ -349,6 +349,8 @@ static unsigned int nf_iterate(struct li
 			       int (*okfn)(struct sk_buff *),
 			       int hook_thresh)
 {
+	unsigned int verdict;
+
 	/*
 	 * The caller must not block between calls to this
 	 * function because of risk of continuing from deleted element.
@@ -361,28 +363,18 @@ static unsigned int nf_iterate(struct li
 
 		/* Optimization: we don't need to hold module
                    reference here, since function can't sleep. --RR */
-		switch (elem->hook(hook, skb, indev, outdev, okfn)) {
-		case NF_QUEUE:
-			return NF_QUEUE;
-
-		case NF_STOLEN:
-			return NF_STOLEN;
-
-		case NF_DROP:
-			return NF_DROP;
-
-		case NF_REPEAT:
-			*i = (*i)->prev;
-			break;
-
+		verdict = elem->hook(hook, skb, indev, outdev, okfn);
+		if (verdict != NF_ACCEPT) {
 #ifdef CONFIG_NETFILTER_DEBUG
-		case NF_ACCEPT:
-			break;
-
-		default:
-			NFDEBUG("Evil return from %p(%u).\n", 
-				elem->hook, hook);
+			if (unlikely(verdict > NF_MAX_VERDICT)) {
+				NFDEBUG("Evil return from %p(%u).\n",
+				        elem->hook, hook);
+				continue;
+			}
 #endif
+			if (verdict != NF_REPEAT)
+				return verdict;
+			*i = (*i)->prev;
 		}
 	}
 	return NF_ACCEPT;
@@ -494,50 +486,47 @@ static int nf_queue(struct sk_buff *skb,
 	return 1;
 }
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+/* Returns 0 if okfn() needs to be executed by the caller, -EPERM otherwise.
+ * Assumes *ret==0 when called. On return, *ret!=0 when verdict==NF_DROP */
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev,
 		 struct net_device *outdev,
 		 int (*okfn)(struct sk_buff *),
-		 int hook_thresh)
+		 int hook_thresh, int *ret)
 {
 	struct list_head *elem;
 	unsigned int verdict;
-	int ret = 0;
+	int ret2 = 0;
 
 	/* We may already have this, but read-locks nest anyway */
 	rcu_read_lock();
 
 #ifdef CONFIG_NETFILTER_DEBUG
-	if (skb->nf_debug & (1 << hook)) {
+	if (unlikely((*pskb)->nf_debug & (1 << hook))) {
 		printk("nf_hook: hook %i already set.\n", hook);
-		nf_dump_skb(pf, skb);
+		nf_dump_skb(pf, *pskb);
 	}
-	skb->nf_debug |= (1 << hook);
+	(*pskb)->nf_debug |= (1 << hook);
 #endif
 
 	elem = &nf_hooks[pf][hook];
  next_hook:
-	verdict = nf_iterate(&nf_hooks[pf][hook], &skb, hook, indev,
+	verdict = nf_iterate(&nf_hooks[pf][hook], pskb, hook, indev,
 			     outdev, &elem, okfn, hook_thresh);
-	if (verdict == NF_QUEUE) {
+	if (verdict == NF_ACCEPT || verdict == NF_STOP)
+		goto unlock;
+	else if (verdict == NF_DROP) {
+		kfree_skb(*pskb);
+		*ret = -EPERM;
+	} else if (verdict == NF_QUEUE) {
 		NFDEBUG("nf_hook: Verdict = QUEUE.\n");
-		if (!nf_queue(skb, elem, pf, hook, indev, outdev, okfn))
+		if (!nf_queue(*pskb, elem, pf, hook, indev, outdev, okfn))
 			goto next_hook;
 	}
-
-	switch (verdict) {
-	case NF_ACCEPT:
-		ret = okfn(skb);
-		break;
-
-	case NF_DROP:
-		kfree_skb(skb);
-		ret = -EPERM;
-		break;
-	}
-
+	ret2 = -EPERM;
+unlock:
 	rcu_read_unlock();
-	return ret;
+	return ret2;
 }
 
 void nf_reinject(struct sk_buff *skb, struct nf_info *info,
--- linux-2.6.11-rc1/net/bridge/br_netfilter.c.old	2005-01-23 13:31:39.080899144 +0100
+++ linux-2.6.11-rc1/net/bridge/br_netfilter.c	2005-01-23 13:32:02.861283976 +0100
@@ -829,8 +829,7 @@ static unsigned int ip_sabotage_in(unsig
 {
 	if ((*pskb)->nf_bridge &&
 	    !((*pskb)->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING)) {
-		okfn(*pskb);
-		return NF_STOLEN;
+		return NF_STOP;
 	}
 
 	return NF_ACCEPT;
@@ -888,8 +887,7 @@ static unsigned int ip_sabotage_out(unsi
 		if (out->priv_flags & IFF_802_1Q_VLAN)
 			nf_bridge->netoutdev = (struct net_device *)out;
 #endif
-		okfn(skb);
-		return NF_STOLEN;
+		return NF_STOP;
 	}
 
 	return NF_ACCEPT;

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter (was: Re: do_IRQ: stack overflow: 872..)
@ 2005-01-23 12:40                 ` Bart De Schuymer
  0 siblings, 0 replies; 53+ messages in thread
From: Bart De Schuymer @ 2005-01-23 12:40 UTC (permalink / raw)
  To: Martin Josefsson
  Cc: netdev, bridge, snort2004, Rusty Russell, dwmw2, David S. Miller,
	shemminger, ak

Op zo, 23-01-2005 te 00:22 +0100, schreef Martin Josefsson:
> On Sat, 2005-01-22 at 23:30 +0100, Bart De Schuymer wrote:
> 
> Hi Bart

Hi Martin,

> I guess you never testcompiled with CONFIG_NETFILTER_DEBUG set :)

Oops, no. I was imagining I could code that without error...
There was another compile error with CONFIG_NETFILTER_DEBUG in the patch
(skb that should have been (*pskb)) :(

> > +			if (verdict > NF_MAX_VERDICT) {
> > +				NFDEBUG("Evil return from %p(%u).\n",
> > +				        elem->hook, hook);
> > +				continue;
> > +			}
> 
> Maybe add unlikely() around the test?

OK. I added another one around another debugging if-statement.

cheers,
Bart

--- linux-2.6.11-rc1/include/linux/netfilter.h.old	2005-01-23 13:31:58.895886808 +0100
+++ linux-2.6.11-rc1/include/linux/netfilter.h	2005-01-23 13:32:02.853285192 +0100
@@ -18,7 +18,8 @@
 #define NF_STOLEN 2
 #define NF_QUEUE 3
 #define NF_REPEAT 4
-#define NF_MAX_VERDICT NF_REPEAT
+#define NF_STOP 5
+#define NF_MAX_VERDICT NF_STOP
 
 /* Generic cache responses from hook functions.
    <= 0x2000 is used for protocol-flags. */
@@ -138,23 +139,34 @@ void nf_log_packet(int pf,
 /* This is gross, but inline doesn't cut it for avoiding the function
    call in fast path: gcc doesn't inline (needs value tracking?). --RR */
 #ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
- nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
-#define NF_HOOK_THRESH nf_hook_slow
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			    \
+({int __ret = 0;							    \
+if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret))  \
+	__ret = (okfn)(skb);						    \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	    \
+({int __ret = 0;							    \
+if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret))   \
+	__ret = (okfn)(skb);						    \
+__ret;})
 #else
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
-#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh)))
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret))   \
+	__ret = (okfn)(skb);						     \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret))    \
+	__ret = (okfn)(skb);						     \
+__ret;})
 #endif
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev, struct net_device *outdev,
-		 int (*okfn)(struct sk_buff *), int thresh);
+		 int (*okfn)(struct sk_buff *), int thresh, int *ret);
 
 /* Call setsockopt() */
 int nf_setsockopt(struct sock *sk, int pf, int optval, char __user *opt, 
--- linux-2.6.11-rc1/net/core/netfilter.c.old	2005-01-23 13:31:48.980394192 +0100
+++ linux-2.6.11-rc1/net/core/netfilter.c	2005-01-23 13:32:02.856284736 +0100
@@ -349,6 +349,8 @@ static unsigned int nf_iterate(struct li
 			       int (*okfn)(struct sk_buff *),
 			       int hook_thresh)
 {
+	unsigned int verdict;
+
 	/*
 	 * The caller must not block between calls to this
 	 * function because of risk of continuing from deleted element.
@@ -361,28 +363,18 @@ static unsigned int nf_iterate(struct li
 
 		/* Optimization: we don't need to hold module
                    reference here, since function can't sleep. --RR */
-		switch (elem->hook(hook, skb, indev, outdev, okfn)) {
-		case NF_QUEUE:
-			return NF_QUEUE;
-
-		case NF_STOLEN:
-			return NF_STOLEN;
-
-		case NF_DROP:
-			return NF_DROP;
-
-		case NF_REPEAT:
-			*i = (*i)->prev;
-			break;
-
+		verdict = elem->hook(hook, skb, indev, outdev, okfn);
+		if (verdict != NF_ACCEPT) {
 #ifdef CONFIG_NETFILTER_DEBUG
-		case NF_ACCEPT:
-			break;
-
-		default:
-			NFDEBUG("Evil return from %p(%u).\n", 
-				elem->hook, hook);
+			if (unlikely(verdict > NF_MAX_VERDICT)) {
+				NFDEBUG("Evil return from %p(%u).\n",
+				        elem->hook, hook);
+				continue;
+			}
 #endif
+			if (verdict != NF_REPEAT)
+				return verdict;
+			*i = (*i)->prev;
 		}
 	}
 	return NF_ACCEPT;
@@ -494,50 +486,47 @@ static int nf_queue(struct sk_buff *skb,
 	return 1;
 }
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+/* Returns 0 if okfn() needs to be executed by the caller, -EPERM otherwise.
+ * Assumes *ret==0 when called. On return, *ret!=0 when verdict==NF_DROP */
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev,
 		 struct net_device *outdev,
 		 int (*okfn)(struct sk_buff *),
-		 int hook_thresh)
+		 int hook_thresh, int *ret)
 {
 	struct list_head *elem;
 	unsigned int verdict;
-	int ret = 0;
+	int ret2 = 0;
 
 	/* We may already have this, but read-locks nest anyway */
 	rcu_read_lock();
 
 #ifdef CONFIG_NETFILTER_DEBUG
-	if (skb->nf_debug & (1 << hook)) {
+	if (unlikely((*pskb)->nf_debug & (1 << hook))) {
 		printk("nf_hook: hook %i already set.\n", hook);
-		nf_dump_skb(pf, skb);
+		nf_dump_skb(pf, *pskb);
 	}
-	skb->nf_debug |= (1 << hook);
+	(*pskb)->nf_debug |= (1 << hook);
 #endif
 
 	elem = &nf_hooks[pf][hook];
  next_hook:
-	verdict = nf_iterate(&nf_hooks[pf][hook], &skb, hook, indev,
+	verdict = nf_iterate(&nf_hooks[pf][hook], pskb, hook, indev,
 			     outdev, &elem, okfn, hook_thresh);
-	if (verdict == NF_QUEUE) {
+	if (verdict == NF_ACCEPT || verdict == NF_STOP)
+		goto unlock;
+	else if (verdict == NF_DROP) {
+		kfree_skb(*pskb);
+		*ret = -EPERM;
+	} else if (verdict == NF_QUEUE) {
 		NFDEBUG("nf_hook: Verdict = QUEUE.\n");
-		if (!nf_queue(skb, elem, pf, hook, indev, outdev, okfn))
+		if (!nf_queue(*pskb, elem, pf, hook, indev, outdev, okfn))
 			goto next_hook;
 	}
-
-	switch (verdict) {
-	case NF_ACCEPT:
-		ret = okfn(skb);
-		break;
-
-	case NF_DROP:
-		kfree_skb(skb);
-		ret = -EPERM;
-		break;
-	}
-
+	ret2 = -EPERM;
+unlock:
 	rcu_read_unlock();
-	return ret;
+	return ret2;
 }
 
 void nf_reinject(struct sk_buff *skb, struct nf_info *info,
--- linux-2.6.11-rc1/net/bridge/br_netfilter.c.old	2005-01-23 13:31:39.080899144 +0100
+++ linux-2.6.11-rc1/net/bridge/br_netfilter.c	2005-01-23 13:32:02.861283976 +0100
@@ -829,8 +829,7 @@ static unsigned int ip_sabotage_in(unsig
 {
 	if ((*pskb)->nf_bridge &&
 	    !((*pskb)->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING)) {
-		okfn(*pskb);
-		return NF_STOLEN;
+		return NF_STOP;
 	}
 
 	return NF_ACCEPT;
@@ -888,8 +887,7 @@ static unsigned int ip_sabotage_out(unsi
 		if (out->priv_flags & IFF_802_1Q_VLAN)
 			nf_bridge->netoutdev = (struct net_device *)out;
 #endif
-		okfn(skb);
-		return NF_STOLEN;
+		return NF_STOP;
 	}
 
 	return NF_ACCEPT;



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter (was: Re: do_IRQ: stack overflow: 872..)
  2005-01-23 12:40                 ` [Bridge] " Bart De Schuymer
@ 2005-01-23 16:08                   ` Martin Josefsson
  -1 siblings, 0 replies; 53+ messages in thread
From: Martin Josefsson @ 2005-01-23 16:08 UTC (permalink / raw)
  To: Bart De Schuymer
  Cc: David S. Miller, shemminger, dwmw2, ak, snort2004, bridge,
	netdev, Rusty Russell

[-- Attachment #1: Type: text/plain, Size: 867 bytes --]

On Sun, 2005-01-23 at 13:40 +0100, Bart De Schuymer wrote:

> Hi Martin,

Hi Bart

> > I guess you never testcompiled with CONFIG_NETFILTER_DEBUG set :)
> 
> Oops, no. I was imagining I could code that without error...
> There was another compile error with CONFIG_NETFILTER_DEBUG in the patch
> (skb that should have been (*pskb)) :(

:)

> > > +			if (verdict > NF_MAX_VERDICT) {
> > > +				NFDEBUG("Evil return from %p(%u).\n",
> > > +				        elem->hook, hook);
> > > +				continue;
> > > +			}
> > 
> > Maybe add unlikely() around the test?
> 
> OK. I added another one around another debugging if-statement.

I'm now running a kernel with this patch and everything seems to still
be working.
So unless someone else has something to comment I think this should be
applied.
The decrease in call-depth is important.

-- 
/Martin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter (was: Re: do_IRQ: stack overflow: 872..)
@ 2005-01-23 16:08                   ` Martin Josefsson
  0 siblings, 0 replies; 53+ messages in thread
From: Martin Josefsson @ 2005-01-23 16:08 UTC (permalink / raw)
  To: Bart De Schuymer
  Cc: netdev, bridge, snort2004, Rusty Russell, dwmw2, David S. Miller,
	shemminger, ak

[-- Attachment #1: Type: text/plain, Size: 867 bytes --]

On Sun, 2005-01-23 at 13:40 +0100, Bart De Schuymer wrote:

> Hi Martin,

Hi Bart

> > I guess you never testcompiled with CONFIG_NETFILTER_DEBUG set :)
> 
> Oops, no. I was imagining I could code that without error...
> There was another compile error with CONFIG_NETFILTER_DEBUG in the patch
> (skb that should have been (*pskb)) :(

:)

> > > +			if (verdict > NF_MAX_VERDICT) {
> > > +				NFDEBUG("Evil return from %p(%u).\n",
> > > +				        elem->hook, hook);
> > > +				continue;
> > > +			}
> > 
> > Maybe add unlikely() around the test?
> 
> OK. I added another one around another debugging if-statement.

I'm now running a kernel with this patch and everything seems to still
be working.
So unless someone else has something to comment I think this should be
applied.
The decrease in call-depth is important.

-- 
/Martin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter (was: Re: do_IRQ: stack overflow: 872..)
  2005-01-23 16:08                   ` [Bridge] " Martin Josefsson
@ 2005-01-26  6:05                     ` David S. Miller
  -1 siblings, 0 replies; 53+ messages in thread
From: David S. Miller @ 2005-01-26  6:05 UTC (permalink / raw)
  To: Martin Josefsson
  Cc: bdschuym, shemminger, dwmw2, ak, snort2004, bridge, netdev, rusty

On Sun, 23 Jan 2005 17:08:29 +0100
Martin Josefsson <gandalf@wlug.westbo.se> wrote:

> I'm now running a kernel with this patch and everything seems to still
> be working.
> So unless someone else has something to comment I think this should be
> applied.
> The decrease in call-depth is important.

I would like to see at least one ACK from the netfilter
folks.  Bart or Rusty, could you forward to patch to
netfilter-devel for review?

I have some other ideas about how bridging might be able
to save some call chain depth... but I need to think about
it some more before proposing or even trying to implement.
(basically something akin to how we do route level packet
 output, via dst_output(), but instead we're doing this
 at ->hard_start_xmit() time)

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter (was: Re: do_IRQ: stack overflow: 872..)
@ 2005-01-26  6:05                     ` David S. Miller
  0 siblings, 0 replies; 53+ messages in thread
From: David S. Miller @ 2005-01-26  6:05 UTC (permalink / raw)
  To: Martin Josefsson
  Cc: netdev, bridge, snort2004, bdschuym, rusty, dwmw2, ak, shemminger

On Sun, 23 Jan 2005 17:08:29 +0100
Martin Josefsson <gandalf@wlug.westbo.se> wrote:

> I'm now running a kernel with this patch and everything seems to still
> be working.
> So unless someone else has something to comment I think this should be
> applied.
> The decrease in call-depth is important.

I would like to see at least one ACK from the netfilter
folks.  Bart or Rusty, could you forward to patch to
netfilter-devel for review?

I have some other ideas about how bridging might be able
to save some call chain depth... but I need to think about
it some more before proposing or even trying to implement.
(basically something akin to how we do route level packet
 output, via dst_output(), but instead we're doing this
 at ->hard_start_xmit() time)

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter (was: Re: do_IRQ: stack overflow: 872..)
  2005-01-26  6:05                     ` [Bridge] " David S. Miller
@ 2005-01-26  9:08                       ` Bart De Schuymer
  -1 siblings, 0 replies; 53+ messages in thread
From: Bart De Schuymer @ 2005-01-26  9:08 UTC (permalink / raw)
  To: David S. Miller
  Cc: Martin Josefsson, shemminger, dwmw2, ak, snort2004, bridge,
	netdev, rusty, netfilter-devel

Op di, 25-01-2005 te 22:05 -0800, schreef David S. Miller: 
> On Sun, 23 Jan 2005 17:08:29 +0100
> Martin Josefsson <gandalf@wlug.westbo.se> wrote:
> 
> > I'm now running a kernel with this patch and everything seems to still
> > be working.
> > So unless someone else has something to comment I think this should be
> > applied.
> > The decrease in call-depth is important.
> 
> I would like to see at least one ACK from the netfilter
> folks.  Bart or Rusty, could you forward to patch to
> netfilter-devel for review?

AFAIK Martin is in the netfilter core team. Anyway, I just included
netfilter-devel.

Does anyone have objections to this patch, which reduces the netfilter
call chain length?

> I have some other ideas about how bridging might be able
> to save some call chain depth... but I need to think about
> it some more before proposing or even trying to implement.
> (basically something akin to how we do route level packet
>  output, via dst_output(), but instead we're doing this
>  at ->hard_start_xmit() time)

I'm all ears :)


--- linux-2.6.11-rc1/include/linux/netfilter.h.old	2005-01-23 13:31:58.895886808 +0100
+++ linux-2.6.11-rc1/include/linux/netfilter.h	2005-01-23 13:32:02.853285192 +0100
@@ -18,7 +18,8 @@
 #define NF_STOLEN 2
 #define NF_QUEUE 3
 #define NF_REPEAT 4
-#define NF_MAX_VERDICT NF_REPEAT
+#define NF_STOP 5
+#define NF_MAX_VERDICT NF_STOP
 
 /* Generic cache responses from hook functions.
    <= 0x2000 is used for protocol-flags. */
@@ -138,23 +139,34 @@ void nf_log_packet(int pf,
 /* This is gross, but inline doesn't cut it for avoiding the function
    call in fast path: gcc doesn't inline (needs value tracking?). --RR */
 #ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
- nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
-#define NF_HOOK_THRESH nf_hook_slow
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			    \
+({int __ret = 0;							    \
+if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret))  \
+	__ret = (okfn)(skb);						    \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	    \
+({int __ret = 0;							    \
+if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret))   \
+	__ret = (okfn)(skb);						    \
+__ret;})
 #else
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
-#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh)))
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret))   \
+	__ret = (okfn)(skb);						     \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret))    \
+	__ret = (okfn)(skb);						     \
+__ret;})
 #endif
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev, struct net_device *outdev,
-		 int (*okfn)(struct sk_buff *), int thresh);
+		 int (*okfn)(struct sk_buff *), int thresh, int *ret);
 
 /* Call setsockopt() */
 int nf_setsockopt(struct sock *sk, int pf, int optval, char __user *opt, 
--- linux-2.6.11-rc1/net/core/netfilter.c.old	2005-01-23 13:31:48.980394192 +0100
+++ linux-2.6.11-rc1/net/core/netfilter.c	2005-01-23 13:32:02.856284736 +0100
@@ -349,6 +349,8 @@ static unsigned int nf_iterate(struct li
 			       int (*okfn)(struct sk_buff *),
 			       int hook_thresh)
 {
+	unsigned int verdict;
+
 	/*
 	 * The caller must not block between calls to this
 	 * function because of risk of continuing from deleted element.
@@ -361,28 +363,18 @@ static unsigned int nf_iterate(struct li
 
 		/* Optimization: we don't need to hold module
                    reference here, since function can't sleep. --RR */
-		switch (elem->hook(hook, skb, indev, outdev, okfn)) {
-		case NF_QUEUE:
-			return NF_QUEUE;
-
-		case NF_STOLEN:
-			return NF_STOLEN;
-
-		case NF_DROP:
-			return NF_DROP;
-
-		case NF_REPEAT:
-			*i = (*i)->prev;
-			break;
-
+		verdict = elem->hook(hook, skb, indev, outdev, okfn);
+		if (verdict != NF_ACCEPT) {
 #ifdef CONFIG_NETFILTER_DEBUG
-		case NF_ACCEPT:
-			break;
-
-		default:
-			NFDEBUG("Evil return from %p(%u).\n", 
-				elem->hook, hook);
+			if (unlikely(verdict > NF_MAX_VERDICT)) {
+				NFDEBUG("Evil return from %p(%u).\n",
+				        elem->hook, hook);
+				continue;
+			}
 #endif
+			if (verdict != NF_REPEAT)
+				return verdict;
+			*i = (*i)->prev;
 		}
 	}
 	return NF_ACCEPT;
@@ -494,50 +486,47 @@ static int nf_queue(struct sk_buff *skb,
 	return 1;
 }
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+/* Returns 0 if okfn() needs to be executed by the caller, -EPERM otherwise.
+ * Assumes *ret==0 when called. On return, *ret!=0 when verdict==NF_DROP */
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev,
 		 struct net_device *outdev,
 		 int (*okfn)(struct sk_buff *),
-		 int hook_thresh)
+		 int hook_thresh, int *ret)
 {
 	struct list_head *elem;
 	unsigned int verdict;
-	int ret = 0;
+	int ret2 = 0;
 
 	/* We may already have this, but read-locks nest anyway */
 	rcu_read_lock();
 
 #ifdef CONFIG_NETFILTER_DEBUG
-	if (skb->nf_debug & (1 << hook)) {
+	if (unlikely((*pskb)->nf_debug & (1 << hook))) {
 		printk("nf_hook: hook %i already set.\n", hook);
-		nf_dump_skb(pf, skb);
+		nf_dump_skb(pf, *pskb);
 	}
-	skb->nf_debug |= (1 << hook);
+	(*pskb)->nf_debug |= (1 << hook);
 #endif
 
 	elem = &nf_hooks[pf][hook];
  next_hook:
-	verdict = nf_iterate(&nf_hooks[pf][hook], &skb, hook, indev,
+	verdict = nf_iterate(&nf_hooks[pf][hook], pskb, hook, indev,
 			     outdev, &elem, okfn, hook_thresh);
-	if (verdict == NF_QUEUE) {
+	if (verdict == NF_ACCEPT || verdict == NF_STOP)
+		goto unlock;
+	else if (verdict == NF_DROP) {
+		kfree_skb(*pskb);
+		*ret = -EPERM;
+	} else if (verdict == NF_QUEUE) {
 		NFDEBUG("nf_hook: Verdict = QUEUE.\n");
-		if (!nf_queue(skb, elem, pf, hook, indev, outdev, okfn))
+		if (!nf_queue(*pskb, elem, pf, hook, indev, outdev, okfn))
 			goto next_hook;
 	}
-
-	switch (verdict) {
-	case NF_ACCEPT:
-		ret = okfn(skb);
-		break;
-
-	case NF_DROP:
-		kfree_skb(skb);
-		ret = -EPERM;
-		break;
-	}
-
+	ret2 = -EPERM;
+unlock:
 	rcu_read_unlock();
-	return ret;
+	return ret2;
 }
 
 void nf_reinject(struct sk_buff *skb, struct nf_info *info,
--- linux-2.6.11-rc1/net/bridge/br_netfilter.c.old	2005-01-23 13:31:39.080899144 +0100
+++ linux-2.6.11-rc1/net/bridge/br_netfilter.c	2005-01-23 13:32:02.861283976 +0100
@@ -829,8 +829,7 @@ static unsigned int ip_sabotage_in(unsig
 {
 	if ((*pskb)->nf_bridge &&
 	    !((*pskb)->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING)) {
-		okfn(*pskb);
-		return NF_STOLEN;
+		return NF_STOP;
 	}
 
 	return NF_ACCEPT;
@@ -888,8 +887,7 @@ static unsigned int ip_sabotage_out(unsi
 		if (out->priv_flags & IFF_802_1Q_VLAN)
 			nf_bridge->netoutdev = (struct net_device *)out;
 #endif
-		okfn(skb);
-		return NF_STOLEN;
+		return NF_STOP;
 	}
 
 	return NF_ACCEPT;

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter (was: Re: do_IRQ: stack overflow: 872..)
@ 2005-01-26  9:08                       ` Bart De Schuymer
  0 siblings, 0 replies; 53+ messages in thread
From: Bart De Schuymer @ 2005-01-26  9:08 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, netfilter-devel, bridge, snort2004, Martin Josefsson,
	rusty, dwmw2, ak, shemminger

Op di, 25-01-2005 te 22:05 -0800, schreef David S. Miller: 
> On Sun, 23 Jan 2005 17:08:29 +0100
> Martin Josefsson <gandalf@wlug.westbo.se> wrote:
> 
> > I'm now running a kernel with this patch and everything seems to still
> > be working.
> > So unless someone else has something to comment I think this should be
> > applied.
> > The decrease in call-depth is important.
> 
> I would like to see at least one ACK from the netfilter
> folks.  Bart or Rusty, could you forward to patch to
> netfilter-devel for review?

AFAIK Martin is in the netfilter core team. Anyway, I just included
netfilter-devel.

Does anyone have objections to this patch, which reduces the netfilter
call chain length?

> I have some other ideas about how bridging might be able
> to save some call chain depth... but I need to think about
> it some more before proposing or even trying to implement.
> (basically something akin to how we do route level packet
>  output, via dst_output(), but instead we're doing this
>  at ->hard_start_xmit() time)

I'm all ears :)


--- linux-2.6.11-rc1/include/linux/netfilter.h.old	2005-01-23 13:31:58.895886808 +0100
+++ linux-2.6.11-rc1/include/linux/netfilter.h	2005-01-23 13:32:02.853285192 +0100
@@ -18,7 +18,8 @@
 #define NF_STOLEN 2
 #define NF_QUEUE 3
 #define NF_REPEAT 4
-#define NF_MAX_VERDICT NF_REPEAT
+#define NF_STOP 5
+#define NF_MAX_VERDICT NF_STOP
 
 /* Generic cache responses from hook functions.
    <= 0x2000 is used for protocol-flags. */
@@ -138,23 +139,34 @@ void nf_log_packet(int pf,
 /* This is gross, but inline doesn't cut it for avoiding the function
    call in fast path: gcc doesn't inline (needs value tracking?). --RR */
 #ifdef CONFIG_NETFILTER_DEBUG
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
- nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
-#define NF_HOOK_THRESH nf_hook_slow
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			    \
+({int __ret = 0;							    \
+if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret))  \
+	__ret = (okfn)(skb);						    \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	    \
+({int __ret = 0;							    \
+if (!nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret))   \
+	__ret = (okfn)(skb);						    \
+__ret;})
 #else
-#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
-#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	\
-(list_empty(&nf_hooks[(pf)][(hook)])					\
- ? (okfn)(skb)								\
- : nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh)))
+#define NF_HOOK(pf, hook, skb, indev, outdev, okfn)			     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, INT_MIN, &__ret))   \
+	__ret = (okfn)(skb);						     \
+__ret;})
+#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh)	     \
+({int __ret = 0;							     \
+if (list_empty(&nf_hooks[pf][hook]) ||					     \
+    !nf_hook_slow(pf, hook, &(skb), indev, outdev, okfn, thresh, &__ret))    \
+	__ret = (okfn)(skb);						     \
+__ret;})
 #endif
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev, struct net_device *outdev,
-		 int (*okfn)(struct sk_buff *), int thresh);
+		 int (*okfn)(struct sk_buff *), int thresh, int *ret);
 
 /* Call setsockopt() */
 int nf_setsockopt(struct sock *sk, int pf, int optval, char __user *opt, 
--- linux-2.6.11-rc1/net/core/netfilter.c.old	2005-01-23 13:31:48.980394192 +0100
+++ linux-2.6.11-rc1/net/core/netfilter.c	2005-01-23 13:32:02.856284736 +0100
@@ -349,6 +349,8 @@ static unsigned int nf_iterate(struct li
 			       int (*okfn)(struct sk_buff *),
 			       int hook_thresh)
 {
+	unsigned int verdict;
+
 	/*
 	 * The caller must not block between calls to this
 	 * function because of risk of continuing from deleted element.
@@ -361,28 +363,18 @@ static unsigned int nf_iterate(struct li
 
 		/* Optimization: we don't need to hold module
                    reference here, since function can't sleep. --RR */
-		switch (elem->hook(hook, skb, indev, outdev, okfn)) {
-		case NF_QUEUE:
-			return NF_QUEUE;
-
-		case NF_STOLEN:
-			return NF_STOLEN;
-
-		case NF_DROP:
-			return NF_DROP;
-
-		case NF_REPEAT:
-			*i = (*i)->prev;
-			break;
-
+		verdict = elem->hook(hook, skb, indev, outdev, okfn);
+		if (verdict != NF_ACCEPT) {
 #ifdef CONFIG_NETFILTER_DEBUG
-		case NF_ACCEPT:
-			break;
-
-		default:
-			NFDEBUG("Evil return from %p(%u).\n", 
-				elem->hook, hook);
+			if (unlikely(verdict > NF_MAX_VERDICT)) {
+				NFDEBUG("Evil return from %p(%u).\n",
+				        elem->hook, hook);
+				continue;
+			}
 #endif
+			if (verdict != NF_REPEAT)
+				return verdict;
+			*i = (*i)->prev;
 		}
 	}
 	return NF_ACCEPT;
@@ -494,50 +486,47 @@ static int nf_queue(struct sk_buff *skb,
 	return 1;
 }
 
-int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
+/* Returns 0 if okfn() needs to be executed by the caller, -EPERM otherwise.
+ * Assumes *ret==0 when called. On return, *ret!=0 when verdict==NF_DROP */
+int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
 		 struct net_device *indev,
 		 struct net_device *outdev,
 		 int (*okfn)(struct sk_buff *),
-		 int hook_thresh)
+		 int hook_thresh, int *ret)
 {
 	struct list_head *elem;
 	unsigned int verdict;
-	int ret = 0;
+	int ret2 = 0;
 
 	/* We may already have this, but read-locks nest anyway */
 	rcu_read_lock();
 
 #ifdef CONFIG_NETFILTER_DEBUG
-	if (skb->nf_debug & (1 << hook)) {
+	if (unlikely((*pskb)->nf_debug & (1 << hook))) {
 		printk("nf_hook: hook %i already set.\n", hook);
-		nf_dump_skb(pf, skb);
+		nf_dump_skb(pf, *pskb);
 	}
-	skb->nf_debug |= (1 << hook);
+	(*pskb)->nf_debug |= (1 << hook);
 #endif
 
 	elem = &nf_hooks[pf][hook];
  next_hook:
-	verdict = nf_iterate(&nf_hooks[pf][hook], &skb, hook, indev,
+	verdict = nf_iterate(&nf_hooks[pf][hook], pskb, hook, indev,
 			     outdev, &elem, okfn, hook_thresh);
-	if (verdict == NF_QUEUE) {
+	if (verdict == NF_ACCEPT || verdict == NF_STOP)
+		goto unlock;
+	else if (verdict == NF_DROP) {
+		kfree_skb(*pskb);
+		*ret = -EPERM;
+	} else if (verdict == NF_QUEUE) {
 		NFDEBUG("nf_hook: Verdict = QUEUE.\n");
-		if (!nf_queue(skb, elem, pf, hook, indev, outdev, okfn))
+		if (!nf_queue(*pskb, elem, pf, hook, indev, outdev, okfn))
 			goto next_hook;
 	}
-
-	switch (verdict) {
-	case NF_ACCEPT:
-		ret = okfn(skb);
-		break;
-
-	case NF_DROP:
-		kfree_skb(skb);
-		ret = -EPERM;
-		break;
-	}
-
+	ret2 = -EPERM;
+unlock:
 	rcu_read_unlock();
-	return ret;
+	return ret2;
 }
 
 void nf_reinject(struct sk_buff *skb, struct nf_info *info,
--- linux-2.6.11-rc1/net/bridge/br_netfilter.c.old	2005-01-23 13:31:39.080899144 +0100
+++ linux-2.6.11-rc1/net/bridge/br_netfilter.c	2005-01-23 13:32:02.861283976 +0100
@@ -829,8 +829,7 @@ static unsigned int ip_sabotage_in(unsig
 {
 	if ((*pskb)->nf_bridge &&
 	    !((*pskb)->nf_bridge->mask & BRNF_NF_BRIDGE_PREROUTING)) {
-		okfn(*pskb);
-		return NF_STOLEN;
+		return NF_STOP;
 	}
 
 	return NF_ACCEPT;
@@ -888,8 +887,7 @@ static unsigned int ip_sabotage_out(unsi
 		if (out->priv_flags & IFF_802_1Q_VLAN)
 			nf_bridge->netoutdev = (struct net_device *)out;
 #endif
-		okfn(skb);
-		return NF_STOLEN;
+		return NF_STOP;
 	}
 
 	return NF_ACCEPT;




^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter
  2005-01-26  9:08                       ` [Bridge] " Bart De Schuymer
@ 2005-01-26 23:49                         ` Patrick McHardy
  -1 siblings, 0 replies; 53+ messages in thread
From: Patrick McHardy @ 2005-01-26 23:49 UTC (permalink / raw)
  To: Bart De Schuymer
  Cc: David S. Miller, netdev, netfilter-devel, snort2004, rusty, ak,
	bridge, Martin Josefsson, dwmw2, shemminger

Bart De Schuymer wrote:

>Does anyone have objections to this patch, which reduces the netfilter
>call chain length?
>
Looks fine to me.

Signed-off-by: Patrick McHardy <kaber@trash.net>

Regards
Patrick

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter
@ 2005-01-26 23:49                         ` Patrick McHardy
  0 siblings, 0 replies; 53+ messages in thread
From: Patrick McHardy @ 2005-01-26 23:49 UTC (permalink / raw)
  To: Bart De Schuymer
  Cc: netdev, netfilter-devel, bridge, snort2004, Martin Josefsson,
	rusty, dwmw2, ak, shemminger, David S. Miller

Bart De Schuymer wrote:

>Does anyone have objections to this patch, which reduces the netfilter
>call chain length?
>
Looks fine to me.

Signed-off-by: Patrick McHardy <kaber@trash.net>

Regards
Patrick


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter
  2005-01-26 23:49                         ` [Bridge] " Patrick McHardy
@ 2005-01-27  7:18                           ` David S. Miller
  -1 siblings, 0 replies; 53+ messages in thread
From: David S. Miller @ 2005-01-27  7:18 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: bdschuym, netdev, netfilter-devel, snort2004, rusty, ak, bridge,
	gandalf, dwmw2, shemminger

On Thu, 27 Jan 2005 00:49:01 +0100
Patrick McHardy <kaber@trash.net> wrote:

> Bart De Schuymer wrote:
> 
> >Does anyone have objections to this patch, which reduces the netfilter
> >call chain length?
> >
> Looks fine to me.
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>

Ok, I applied this.

While reviewing I thought it may be an issue that the new macros
potentially change skb.  It really isn't an issue because NF_HOOK()
calls pass ownership of the SKB over from the caller.

Although technically, someone could go:

	skb_get(skb);
	err = NF_HOOK(... skb ...);
	... do stuff with skb ...
	kfree_skb(skb);

but that would cause other problems and I audited the entire tree
and nobody attempts anything like this currently.  'skb' always
dies at the NF_HOOK() call site.

I guess if we wanted to preserve NF_HOOK*() semantics even in such
a case we could use a local "__skb" var in the macro's basic block.

Another huge downside to this change I was worried about
was from a code generation point of view.  Since we now take the
address of "skb", gcc cannot generate tail-calls for the common
case of:

	return NF_HOOK(...);

when netfilter is enabled.  Ho hum...

Wait...

This is actually an important point!  Since gcc is generating a tail-
call for NF_HOOK() today, there is no stack savings for NF_HOOK()
created by this patch.  The only real gain is the NF_STOP stuff
for bridge netfilter.

I'm backing this out of my tree, let's think about this some more.
Perhaps it's only worth adding the NF_STOP thing and just making
nf_hook_slow() do the okfn(skb); call in that case?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter
@ 2005-01-27  7:18                           ` David S. Miller
  0 siblings, 0 replies; 53+ messages in thread
From: David S. Miller @ 2005-01-27  7:18 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: netdev, netfilter-devel, bdschuym, bridge, snort2004, rusty,
	dwmw2, ak, shemminger, gandalf

On Thu, 27 Jan 2005 00:49:01 +0100
Patrick McHardy <kaber@trash.net> wrote:

> Bart De Schuymer wrote:
> 
> >Does anyone have objections to this patch, which reduces the netfilter
> >call chain length?
> >
> Looks fine to me.
> 
> Signed-off-by: Patrick McHardy <kaber@trash.net>

Ok, I applied this.

While reviewing I thought it may be an issue that the new macros
potentially change skb.  It really isn't an issue because NF_HOOK()
calls pass ownership of the SKB over from the caller.

Although technically, someone could go:

	skb_get(skb);
	err = NF_HOOK(... skb ...);
	... do stuff with skb ...
	kfree_skb(skb);

but that would cause other problems and I audited the entire tree
and nobody attempts anything like this currently.  'skb' always
dies at the NF_HOOK() call site.

I guess if we wanted to preserve NF_HOOK*() semantics even in such
a case we could use a local "__skb" var in the macro's basic block.

Another huge downside to this change I was worried about
was from a code generation point of view.  Since we now take the
address of "skb", gcc cannot generate tail-calls for the common
case of:

	return NF_HOOK(...);

when netfilter is enabled.  Ho hum...

Wait...

This is actually an important point!  Since gcc is generating a tail-
call for NF_HOOK() today, there is no stack savings for NF_HOOK()
created by this patch.  The only real gain is the NF_STOP stuff
for bridge netfilter.

I'm backing this out of my tree, let's think about this some more.
Perhaps it's only worth adding the NF_STOP thing and just making
nf_hook_slow() do the okfn(skb); call in that case?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter
  2005-01-27  7:18                           ` [Bridge] " David S. Miller
@ 2005-01-27 17:50                             ` Patrick McHardy
  -1 siblings, 0 replies; 53+ messages in thread
From: Patrick McHardy @ 2005-01-27 17:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: bdschuym, netdev, netfilter-devel, snort2004, rusty, ak, bridge,
	gandalf, dwmw2, shemminger

David S. Miller wrote:

>While reviewing I thought it may be an issue that the new macros
>potentially change skb.  It really isn't an issue because NF_HOOK()
>calls pass ownership of the SKB over from the caller.
>
>Although technically, someone could go:
>
>	skb_get(skb);
>	err = NF_HOOK(... skb ...);
>	... do stuff with skb ...
>	kfree_skb(skb);
>
>but that would cause other problems and I audited the entire tree
>and nobody attempts anything like this currently.  'skb' always
>dies at the NF_HOOK() call site.
>
Yes, it has always been illegal to use the skb after NF_HOOK.

>Another huge downside to this change I was worried about
>was from a code generation point of view.  Since we now take the
>address of "skb", gcc cannot generate tail-calls for the common
>case of:
>
>	return NF_HOOK(...);
>
>when netfilter is enabled.  Ho hum...
>
 From what I can see it doesn't generate tail-calls currently:

 34c:   45 31 c0                xor    %r8d,%r8d
 34f:   4c 89 e2                mov    %r12,%rdx
 352:   be 01 00 00 00          mov    $0x1,%esi
 357:   bf 02 00 00 00          mov    $0x2,%edi
 35c:   c7 04 24 00 00 00 80    movl   $0x80000000,(%rsp)
 363:   e8 00 00 00 00          callq  368 <ip_local_deliver+0x248>
                        364: R_X86_64_PC32      
nf_hook_slow+0xfffffffffffffffc
 368:   48 83 c4 10             add    $0x10,%rsp
 36c:   5b                      pop    %rbx
 36d:   5d                      pop    %rbp
 36e:   41 5c                   pop    %r12
 370:   c3                      retq

According to something I found on the internet, gcc only optimizes
tail-calls if some conditions are met, in this case most importantly
the space required for the arguments to the function called at the tail
must not exceed the space required for the arguments of the function
itself. nf_hook_slow takes 6 arguments, probably more than any caller.

Regards
Patrick

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter
@ 2005-01-27 17:50                             ` Patrick McHardy
  0 siblings, 0 replies; 53+ messages in thread
From: Patrick McHardy @ 2005-01-27 17:50 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, netfilter-devel, bdschuym, bridge, snort2004, rusty,
	dwmw2, ak, shemminger, gandalf

David S. Miller wrote:

>While reviewing I thought it may be an issue that the new macros
>potentially change skb.  It really isn't an issue because NF_HOOK()
>calls pass ownership of the SKB over from the caller.
>
>Although technically, someone could go:
>
>	skb_get(skb);
>	err = NF_HOOK(... skb ...);
>	... do stuff with skb ...
>	kfree_skb(skb);
>
>but that would cause other problems and I audited the entire tree
>and nobody attempts anything like this currently.  'skb' always
>dies at the NF_HOOK() call site.
>
Yes, it has always been illegal to use the skb after NF_HOOK.

>Another huge downside to this change I was worried about
>was from a code generation point of view.  Since we now take the
>address of "skb", gcc cannot generate tail-calls for the common
>case of:
>
>	return NF_HOOK(...);
>
>when netfilter is enabled.  Ho hum...
>
 From what I can see it doesn't generate tail-calls currently:

 34c:   45 31 c0                xor    %r8d,%r8d
 34f:   4c 89 e2                mov    %r12,%rdx
 352:   be 01 00 00 00          mov    $0x1,%esi
 357:   bf 02 00 00 00          mov    $0x2,%edi
 35c:   c7 04 24 00 00 00 80    movl   $0x80000000,(%rsp)
 363:   e8 00 00 00 00          callq  368 <ip_local_deliver+0x248>
                        364: R_X86_64_PC32      
nf_hook_slow+0xfffffffffffffffc
 368:   48 83 c4 10             add    $0x10,%rsp
 36c:   5b                      pop    %rbx
 36d:   5d                      pop    %rbp
 36e:   41 5c                   pop    %r12
 370:   c3                      retq

According to something I found on the internet, gcc only optimizes
tail-calls if some conditions are met, in this case most importantly
the space required for the arguments to the function called at the tail
must not exceed the space required for the arguments of the function
itself. nf_hook_slow takes 6 arguments, probably more than any caller.

Regards
Patrick


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter
  2005-01-27 17:50                             ` [Bridge] " Patrick McHardy
@ 2005-01-27 19:47                               ` David S. Miller
  -1 siblings, 0 replies; 53+ messages in thread
From: David S. Miller @ 2005-01-27 19:47 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: bdschuym, netdev, netfilter-devel, snort2004, rusty, ak, bridge,
	gandalf, dwmw2, shemminger

On Thu, 27 Jan 2005 18:50:50 +0100
Patrick McHardy <kaber@trash.net> wrote:

>  From what I can see it doesn't generate tail-calls currently:

Indeed... It even doesn't do this on Sparc64 either, even for
the okfn(skb) call which I was sure it would.

It won't tail-call for function pointers for some strance reason
as exhibited by this simple test:

struct sk_buff {
	int foo;
};

int invoke(struct sk_buff *skb, int (*okfn)(struct sk_buff *))
{
	return okfn(skb);
}

extern int test_func(struct sk_buff *);

int invoke2(struct sk_buff *skb)
{
	return test_func(skb);
}

In the generated asm on sparc64, invoke2() gets a tail-call
whereas invoke() does not.  Hmmm...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter
@ 2005-01-27 19:47                               ` David S. Miller
  0 siblings, 0 replies; 53+ messages in thread
From: David S. Miller @ 2005-01-27 19:47 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: netdev, netfilter-devel, bdschuym, bridge, snort2004, rusty,
	dwmw2, ak, shemminger, gandalf

On Thu, 27 Jan 2005 18:50:50 +0100
Patrick McHardy <kaber@trash.net> wrote:

>  From what I can see it doesn't generate tail-calls currently:

Indeed... It even doesn't do this on Sparc64 either, even for
the okfn(skb) call which I was sure it would.

It won't tail-call for function pointers for some strance reason
as exhibited by this simple test:

struct sk_buff {
	int foo;
};

int invoke(struct sk_buff *skb, int (*okfn)(struct sk_buff *))
{
	return okfn(skb);
}

extern int test_func(struct sk_buff *);

int invoke2(struct sk_buff *skb)
{
	return test_func(skb);
}

In the generated asm on sparc64, invoke2() gets a tail-call
whereas invoke() does not.  Hmmm...

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter
  2005-01-27 19:47                               ` [Bridge] " David S. Miller
@ 2005-01-27 21:16                                 ` Bart De Schuymer
  -1 siblings, 0 replies; 53+ messages in thread
From: Bart De Schuymer @ 2005-01-27 21:16 UTC (permalink / raw)
  To: David S. Miller
  Cc: Patrick McHardy, netdev, netfilter-devel, snort2004, rusty, ak,
	bridge, gandalf, dwmw2, shemminger

Op do, 27-01-2005 te 11:47 -0800, schreef David S. Miller:
> On Thu, 27 Jan 2005 18:50:50 +0100
> Patrick McHardy <kaber@trash.net> wrote:
> 
> >  From what I can see it doesn't generate tail-calls currently:
> 
> Indeed... It even doesn't do this on Sparc64 either, even for
> the okfn(skb) call which I was sure it would.
> 
> It won't tail-call for function pointers for some strance reason
> as exhibited by this simple test:
> 
> struct sk_buff {
> 	int foo;
> };
> 
> int invoke(struct sk_buff *skb, int (*okfn)(struct sk_buff *))
> {
> 	return okfn(skb);
> }
> 
> extern int test_func(struct sk_buff *);
> 
> int invoke2(struct sk_buff *skb)
> {
> 	return test_func(skb);
> }
> 
> In the generated asm on sparc64, invoke2() gets a tail-call
> whereas invoke() does not.  Hmmm...

Pasha (<snort2004@mail.ru>) is currently using a bridge-nf patch vs
2.4.29 with the changes I sent to you. After two days he sent me
(yesterday) a message that all is well. Without the patch he was getting
the stack overflow oopses.

cheers,
Bart

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter
@ 2005-01-27 21:16                                 ` Bart De Schuymer
  0 siblings, 0 replies; 53+ messages in thread
From: Bart De Schuymer @ 2005-01-27 21:16 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, netfilter-devel, bridge, snort2004, gandalf, rusty,
	dwmw2, ak, shemminger

Op do, 27-01-2005 te 11:47 -0800, schreef David S. Miller:
> On Thu, 27 Jan 2005 18:50:50 +0100
> Patrick McHardy <kaber@trash.net> wrote:
> 
> >  From what I can see it doesn't generate tail-calls currently:
> 
> Indeed... It even doesn't do this on Sparc64 either, even for
> the okfn(skb) call which I was sure it would.
> 
> It won't tail-call for function pointers for some strance reason
> as exhibited by this simple test:
> 
> struct sk_buff {
> 	int foo;
> };
> 
> int invoke(struct sk_buff *skb, int (*okfn)(struct sk_buff *))
> {
> 	return okfn(skb);
> }
> 
> extern int test_func(struct sk_buff *);
> 
> int invoke2(struct sk_buff *skb)
> {
> 	return test_func(skb);
> }
> 
> In the generated asm on sparc64, invoke2() gets a tail-call
> whereas invoke() does not.  Hmmm...

Pasha (<snort2004@mail.ru>) is currently using a bridge-nf patch vs
2.4.29 with the changes I sent to you. After two days he sent me
(yesterday) a message that all is well. Without the patch he was getting
the stack overflow oopses.

cheers,
Bart



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter
  2005-01-27 19:47                               ` [Bridge] " David S. Miller
@ 2005-01-27 22:48                                 ` Patrick McHardy
  -1 siblings, 0 replies; 53+ messages in thread
From: Patrick McHardy @ 2005-01-27 22:48 UTC (permalink / raw)
  To: David S. Miller
  Cc: bdschuym, netdev, netfilter-devel, snort2004, rusty, ak, bridge,
	gandalf, dwmw2, shemminger

David S. Miller wrote:

>In the generated asm on sparc64, invoke2() gets a tail-call
>whereas invoke() does not.  Hmmm...
>
>
Apparently support for indirect tail-calls was added to gcc 3.4.

gcc-3.3.5:
0x0000000000000000 <invoke+0>:  sub    $0x8,%rsp
0x0000000000000004 <invoke+4>:  callq  *%esi
0x0000000000000006 <invoke+6>:  add    $0x8,%rsp
0x000000000000000a <invoke+10>: retq  

gcc-3.4.4:
0x0000000000000000 <invoke+0>:  mov    %rsi,%r11
0x0000000000000003 <invoke+3>:  jmpq   *%r11d

Regards
Patrick

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter
@ 2005-01-27 22:48                                 ` Patrick McHardy
  0 siblings, 0 replies; 53+ messages in thread
From: Patrick McHardy @ 2005-01-27 22:48 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, netfilter-devel, bdschuym, bridge, snort2004, rusty,
	dwmw2, ak, shemminger, gandalf

David S. Miller wrote:

>In the generated asm on sparc64, invoke2() gets a tail-call
>whereas invoke() does not.  Hmmm...
>
>
Apparently support for indirect tail-calls was added to gcc 3.4.

gcc-3.3.5:
0x0000000000000000 <invoke+0>:  sub    $0x8,%rsp
0x0000000000000004 <invoke+4>:  callq  *%esi
0x0000000000000006 <invoke+6>:  add    $0x8,%rsp
0x000000000000000a <invoke+10>: retq  

gcc-3.4.4:
0x0000000000000000 <invoke+0>:  mov    %rsi,%r11
0x0000000000000003 <invoke+3>:  jmpq   *%r11d

Regards
Patrick


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter
  2005-01-27 22:48                                 ` [Bridge] " Patrick McHardy
@ 2005-01-27 23:24                                   ` David S. Miller
  -1 siblings, 0 replies; 53+ messages in thread
From: David S. Miller @ 2005-01-27 23:24 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: bdschuym, netdev, netfilter-devel, snort2004, rusty, ak, bridge,
	gandalf, dwmw2, shemminger

On Thu, 27 Jan 2005 23:48:04 +0100
Patrick McHardy <kaber@trash.net> wrote:

> David S. Miller wrote:
> 
> >In the generated asm on sparc64, invoke2() gets a tail-call
> >whereas invoke() does not.  Hmmm...
>
> Apparently support for indirect tail-calls was added to gcc 3.4.

Good data point.

I've been trying to figure out ways to decrease the number of
args that get sent to nf_hook_slow but this would require
some API changes unfortunately.

One idea goes like this, we create little descriptors of the form:

struct nf_hook_desc {
	int (*okfn)(struct sk_buff *);
	int pf;
	int hook;
};

Then NF_HOOK*() callsites do something like this:

static const struct nf_hook_desc nf_ip_local_out = {
	.okfn = dst_output,
	.pf = PF_INET,
	.hook = NF_IP_LOCAL_OUT,
};

...

	/* Send it out. */
	return NF_HOOK(&nf_ip_local_out, skb, NULL, rt->u.dst.dev);

This gets us down to 4 arguments from 6.  I think we can kill
one more.

It is never the case that both indev and outdev are both
set, so we can use some nf_hook_desc piece of state to
indicate which (in or out) the passed device pointer is.

Oh yes, we can nicely add the thresh thing in here too
while we're at it.

So the final nf_hook_desc might look something something like:

struct nf_hook_desc {
	int (*okfn)(struct sk_buff *);
	int hook;
	int thresh;
	u8 pf; /* AF_MAX is 32 */
	u8 is_output;
};

Hook could possibly use a smaller type as well to condense
the size of this thing even further.  I don't know if there
are any nice assumptions we can make about the hook numbers.

Now, back to the compatability issue.  We could create a
new macro, NF_HOOK_DESC() and keep the existing ones around
via some nf_hook_slow() that basically does:

int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
		 struct net_device *indev, struct net_device *outdev,
		 int (*okfn)(struct sk_buff *), int thresh)
{
	struct nf_hook_desc desc;

	desc.okfn = okfn;
	desc.hook = hook;
	desc.thresh = thresh;
	desc.pf = pf;
	desc.is_output = (outdev != NULL);
	return nf_hook_desc(&desc, skb, (outdev ? outdev : indev));
}

So the final new stuff looks something like:

#ifdef CONFIG_NETFILTER
struct nf_hook_desc {
	int (*okfn)(struct sk_buff *);
	int hook;
	int thresh;
	u8 pf; /* AF_MAX is 32 */
	u8 is_output;
};
#define NF_DESC_DECLARE(_name, _okfn, _hook, _thresh, _pf, _is_output) \
static const struct nf_hook_desc _name = { \
	.okfn = _okfn, \
	.hook = _hook, \
	.thresh = _thresh, \
	.pf = _pf, \
	.is_output = _is_output, \
};

extern int nf_hook_desc(struct nf_hook_desc *desc, struct sk_buff *skb,
			struct net_device *dev);

#define NF_HOOK_DESC(_desc, _skb, _dev) \
	nf_hook_desc(_desc, _skb, _dev)
#endif

Just throwing around ideas... comments?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter
@ 2005-01-27 23:24                                   ` David S. Miller
  0 siblings, 0 replies; 53+ messages in thread
From: David S. Miller @ 2005-01-27 23:24 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: netdev, netfilter-devel, bdschuym, bridge, snort2004, rusty,
	dwmw2, ak, shemminger, gandalf

On Thu, 27 Jan 2005 23:48:04 +0100
Patrick McHardy <kaber@trash.net> wrote:

> David S. Miller wrote:
> 
> >In the generated asm on sparc64, invoke2() gets a tail-call
> >whereas invoke() does not.  Hmmm...
>
> Apparently support for indirect tail-calls was added to gcc 3.4.

Good data point.

I've been trying to figure out ways to decrease the number of
args that get sent to nf_hook_slow but this would require
some API changes unfortunately.

One idea goes like this, we create little descriptors of the form:

struct nf_hook_desc {
	int (*okfn)(struct sk_buff *);
	int pf;
	int hook;
};

Then NF_HOOK*() callsites do something like this:

static const struct nf_hook_desc nf_ip_local_out = {
	.okfn = dst_output,
	.pf = PF_INET,
	.hook = NF_IP_LOCAL_OUT,
};

...

	/* Send it out. */
	return NF_HOOK(&nf_ip_local_out, skb, NULL, rt->u.dst.dev);

This gets us down to 4 arguments from 6.  I think we can kill
one more.

It is never the case that both indev and outdev are both
set, so we can use some nf_hook_desc piece of state to
indicate which (in or out) the passed device pointer is.

Oh yes, we can nicely add the thresh thing in here too
while we're at it.

So the final nf_hook_desc might look something something like:

struct nf_hook_desc {
	int (*okfn)(struct sk_buff *);
	int hook;
	int thresh;
	u8 pf; /* AF_MAX is 32 */
	u8 is_output;
};

Hook could possibly use a smaller type as well to condense
the size of this thing even further.  I don't know if there
are any nice assumptions we can make about the hook numbers.

Now, back to the compatability issue.  We could create a
new macro, NF_HOOK_DESC() and keep the existing ones around
via some nf_hook_slow() that basically does:

int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
		 struct net_device *indev, struct net_device *outdev,
		 int (*okfn)(struct sk_buff *), int thresh)
{
	struct nf_hook_desc desc;

	desc.okfn = okfn;
	desc.hook = hook;
	desc.thresh = thresh;
	desc.pf = pf;
	desc.is_output = (outdev != NULL);
	return nf_hook_desc(&desc, skb, (outdev ? outdev : indev));
}

So the final new stuff looks something like:

#ifdef CONFIG_NETFILTER
struct nf_hook_desc {
	int (*okfn)(struct sk_buff *);
	int hook;
	int thresh;
	u8 pf; /* AF_MAX is 32 */
	u8 is_output;
};
#define NF_DESC_DECLARE(_name, _okfn, _hook, _thresh, _pf, _is_output) \
static const struct nf_hook_desc _name = { \
	.okfn = _okfn, \
	.hook = _hook, \
	.thresh = _thresh, \
	.pf = _pf, \
	.is_output = _is_output, \
};

extern int nf_hook_desc(struct nf_hook_desc *desc, struct sk_buff *skb,
			struct net_device *dev);

#define NF_HOOK_DESC(_desc, _skb, _dev) \
	nf_hook_desc(_desc, _skb, _dev)
#endif

Just throwing around ideas... comments?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter
  2005-01-27 23:24                                   ` [Bridge] " David S. Miller
@ 2005-01-28  0:08                                     ` Patrick McHardy
  -1 siblings, 0 replies; 53+ messages in thread
From: Patrick McHardy @ 2005-01-28  0:08 UTC (permalink / raw)
  To: David S. Miller
  Cc: bdschuym, netdev, netfilter-devel, snort2004, rusty, ak, bridge,
	gandalf, dwmw2, shemminger

David S. Miller wrote:

>I've been trying to figure out ways to decrease the number of
>args that get sent to nf_hook_slow but this would require
>some API changes unfortunately.
>
>One idea goes like this, we create little descriptors of the form:
>
>struct nf_hook_desc {
>	int (*okfn)(struct sk_buff *);
>	int pf;
>	int hook;
>};
>
>Then NF_HOOK*() callsites do something like this:
>
>static const struct nf_hook_desc nf_ip_local_out = {
>	.okfn = dst_output,
>	.pf = PF_INET,
>	.hook = NF_IP_LOCAL_OUT,
>};
>
>...
>
>	/* Send it out. */
>	return NF_HOOK(&nf_ip_local_out, skb, NULL, rt->u.dst.dev);
>
>This gets us down to 4 arguments from 6.  I think we can kill
>one more.
>
>It is never the case that both indev and outdev are both
>set, so we can use some nf_hook_desc piece of state to
>indicate which (in or out) the passed device pointer is.
>  
>
indev and outdev are both set in the forward hook.

>Oh yes, we can nicely add the thresh thing in here too
>while we're at it.
>
>So the final nf_hook_desc might look something something like:
>
>struct nf_hook_desc {
>	int (*okfn)(struct sk_buff *);
>	int hook;
>	int thresh;
>	u8 pf; /* AF_MAX is 32 */
>	u8 is_output;
>};
>
>Hook could possibly use a smaller type as well to condense
>the size of this thing even further.  I don't know if there
>are any nice assumptions we can make about the hook numbers.
>  
>
There are currently five hooks. I really hope we'll never reach
256, so u8 should be big enough.

>Now, back to the compatability issue.  We could create a
>new macro, NF_HOOK_DESC() and keep the existing ones around
>via some nf_hook_slow() that basically does:
>
>int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
>		 struct net_device *indev, struct net_device *outdev,
>		 int (*okfn)(struct sk_buff *), int thresh)
>{
>	struct nf_hook_desc desc;
>
>	desc.okfn = okfn;
>	desc.hook = hook;
>	desc.thresh = thresh;
>	desc.pf = pf;
>	desc.is_output = (outdev != NULL);
>	return nf_hook_desc(&desc, skb, (outdev ? outdev : indev));
>}
>
>So the final new stuff looks something like:
>
>#ifdef CONFIG_NETFILTER
>struct nf_hook_desc {
>	int (*okfn)(struct sk_buff *);
>	int hook;
>	int thresh;
>	u8 pf; /* AF_MAX is 32 */
>	u8 is_output;
>};
>#define NF_DESC_DECLARE(_name, _okfn, _hook, _thresh, _pf, _is_output) \
>static const struct nf_hook_desc _name = { \
>	.okfn = _okfn, \
>	.hook = _hook, \
>	.thresh = _thresh, \
>	.pf = _pf, \
>	.is_output = _is_output, \
>};
>
>extern int nf_hook_desc(struct nf_hook_desc *desc, struct sk_buff *skb,
>			struct net_device *dev);
>
>#define NF_HOOK_DESC(_desc, _skb, _dev) \
>	nf_hook_desc(_desc, _skb, _dev)
>#endif
>
>Just throwing around ideas... comments?
>  
>
Sounds like a good idea to get rid of the static arguments to
nf_hook_slow. Keeping both devices we are still down from 7 to
4 arguments with your suggestion.

Regards
Patrick

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter
@ 2005-01-28  0:08                                     ` Patrick McHardy
  0 siblings, 0 replies; 53+ messages in thread
From: Patrick McHardy @ 2005-01-28  0:08 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, netfilter-devel, bdschuym, bridge, snort2004, rusty,
	dwmw2, ak, shemminger, gandalf

David S. Miller wrote:

>I've been trying to figure out ways to decrease the number of
>args that get sent to nf_hook_slow but this would require
>some API changes unfortunately.
>
>One idea goes like this, we create little descriptors of the form:
>
>struct nf_hook_desc {
>	int (*okfn)(struct sk_buff *);
>	int pf;
>	int hook;
>};
>
>Then NF_HOOK*() callsites do something like this:
>
>static const struct nf_hook_desc nf_ip_local_out = {
>	.okfn = dst_output,
>	.pf = PF_INET,
>	.hook = NF_IP_LOCAL_OUT,
>};
>
>...
>
>	/* Send it out. */
>	return NF_HOOK(&nf_ip_local_out, skb, NULL, rt->u.dst.dev);
>
>This gets us down to 4 arguments from 6.  I think we can kill
>one more.
>
>It is never the case that both indev and outdev are both
>set, so we can use some nf_hook_desc piece of state to
>indicate which (in or out) the passed device pointer is.
>  
>
indev and outdev are both set in the forward hook.

>Oh yes, we can nicely add the thresh thing in here too
>while we're at it.
>
>So the final nf_hook_desc might look something something like:
>
>struct nf_hook_desc {
>	int (*okfn)(struct sk_buff *);
>	int hook;
>	int thresh;
>	u8 pf; /* AF_MAX is 32 */
>	u8 is_output;
>};
>
>Hook could possibly use a smaller type as well to condense
>the size of this thing even further.  I don't know if there
>are any nice assumptions we can make about the hook numbers.
>  
>
There are currently five hooks. I really hope we'll never reach
256, so u8 should be big enough.

>Now, back to the compatability issue.  We could create a
>new macro, NF_HOOK_DESC() and keep the existing ones around
>via some nf_hook_slow() that basically does:
>
>int nf_hook_slow(int pf, unsigned int hook, struct sk_buff *skb,
>		 struct net_device *indev, struct net_device *outdev,
>		 int (*okfn)(struct sk_buff *), int thresh)
>{
>	struct nf_hook_desc desc;
>
>	desc.okfn = okfn;
>	desc.hook = hook;
>	desc.thresh = thresh;
>	desc.pf = pf;
>	desc.is_output = (outdev != NULL);
>	return nf_hook_desc(&desc, skb, (outdev ? outdev : indev));
>}
>
>So the final new stuff looks something like:
>
>#ifdef CONFIG_NETFILTER
>struct nf_hook_desc {
>	int (*okfn)(struct sk_buff *);
>	int hook;
>	int thresh;
>	u8 pf; /* AF_MAX is 32 */
>	u8 is_output;
>};
>#define NF_DESC_DECLARE(_name, _okfn, _hook, _thresh, _pf, _is_output) \
>static const struct nf_hook_desc _name = { \
>	.okfn = _okfn, \
>	.hook = _hook, \
>	.thresh = _thresh, \
>	.pf = _pf, \
>	.is_output = _is_output, \
>};
>
>extern int nf_hook_desc(struct nf_hook_desc *desc, struct sk_buff *skb,
>			struct net_device *dev);
>
>#define NF_HOOK_DESC(_desc, _skb, _dev) \
>	nf_hook_desc(_desc, _skb, _dev)
>#endif
>
>Just throwing around ideas... comments?
>  
>
Sounds like a good idea to get rid of the static arguments to
nf_hook_slow. Keeping both devices we are still down from 7 to
4 arguments with your suggestion.

Regards
Patrick


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter
  2005-01-27 23:24                                   ` [Bridge] " David S. Miller
@ 2005-01-28  0:29                                     ` Rusty Russell
  -1 siblings, 0 replies; 53+ messages in thread
From: Rusty Russell @ 2005-01-28  0:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: Patrick McHardy, bdschuym, netdev,
	Netfilter development mailing list, snort2004, ak, bridge,
	gandalf, dwmw2, shemminger

On Thu, 2005-01-27 at 15:24 -0800, David S. Miller wrote:
> It is never the case that both indev and outdev are both
> set, so we can use some nf_hook_desc piece of state to
> indicate which (in or out) the passed device pointer is.

NF_IP_FORWARD... it's really *really* useful for filtering forwarded
packets.  However, is it ever the case that indev != skb->dev?  If not,
we can simply drop that arg and use skb->dev.

> Now, back to the compatability issue.  We could create a
> new macro, NF_HOOK_DESC() and keep the existing ones around
> via some nf_hook_slow() that basically does:

No, let's just fix them all.  Also, gcc 3.4 will discard unused static
variables, so I prefer the kernel start just declaring structs as
normal, and have them "used" in the !CONFIG case by an inline function
which allows gcc to realize that it can be eliminated, but suppresses
unused warnings.  This also means you get type checking etc without the
config option.

BTW, someone should go through and start ripping out #ifdef
CONFIG_PROC_FS around those structs, too...

Cheers,
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter
@ 2005-01-28  0:29                                     ` Rusty Russell
  0 siblings, 0 replies; 53+ messages in thread
From: Rusty Russell @ 2005-01-28  0:29 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, bdschuym, bridge, snort2004, gandalf,
	Netfilter development mailing list, dwmw2, ak, shemminger

On Thu, 2005-01-27 at 15:24 -0800, David S. Miller wrote:
> It is never the case that both indev and outdev are both
> set, so we can use some nf_hook_desc piece of state to
> indicate which (in or out) the passed device pointer is.

NF_IP_FORWARD... it's really *really* useful for filtering forwarded
packets.  However, is it ever the case that indev != skb->dev?  If not,
we can simply drop that arg and use skb->dev.

> Now, back to the compatability issue.  We could create a
> new macro, NF_HOOK_DESC() and keep the existing ones around
> via some nf_hook_slow() that basically does:

No, let's just fix them all.  Also, gcc 3.4 will discard unused static
variables, so I prefer the kernel start just declaring structs as
normal, and have them "used" in the !CONFIG case by an inline function
which allows gcc to realize that it can be eliminated, but suppresses
unused warnings.  This also means you get type checking etc without the
config option.

BTW, someone should go through and start ripping out #ifdef
CONFIG_PROC_FS around those structs, too...

Cheers,
Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter
  2005-01-28  0:29                                     ` [Bridge] " Rusty Russell
@ 2005-01-28  1:10                                       ` David S. Miller
  -1 siblings, 0 replies; 53+ messages in thread
From: David S. Miller @ 2005-01-28  1:10 UTC (permalink / raw)
  To: Rusty Russell
  Cc: kaber, bdschuym, netdev, netfilter-devel, snort2004, ak, bridge,
	gandalf, dwmw2, shemminger

On Fri, 28 Jan 2005 11:29:29 +1100
Rusty Russell <rusty@rustcorp.com.au> wrote:

> No, let's just fix them all.

In tree, yes.  But leaving the NF_HOOK()/NF_HOOK_THRESH() compat
macros in there for out-of-tree modules I feel is mandatory, it's
a major API change.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter
@ 2005-01-28  1:10                                       ` David S. Miller
  0 siblings, 0 replies; 53+ messages in thread
From: David S. Miller @ 2005-01-28  1:10 UTC (permalink / raw)
  To: Rusty Russell
  Cc: netdev, bdschuym, bridge, snort2004, gandalf, netfilter-devel,
	dwmw2, ak, shemminger

On Fri, 28 Jan 2005 11:29:29 +1100
Rusty Russell <rusty@rustcorp.com.au> wrote:

> No, let's just fix them all.

In tree, yes.  But leaving the NF_HOOK()/NF_HOOK_THRESH() compat
macros in there for out-of-tree modules I feel is mandatory, it's
a major API change.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter
  2005-01-28  1:10                                       ` [Bridge] " David S. Miller
@ 2005-01-28  1:32                                         ` Rusty Russell
  -1 siblings, 0 replies; 53+ messages in thread
From: Rusty Russell @ 2005-01-28  1:32 UTC (permalink / raw)
  To: David S. Miller
  Cc: Patrick McHardy, bdschuym, netdev,
	Netfilter development mailing list, snort2004, ak, bridge,
	gandalf, dwmw2, shemminger

On Thu, 2005-01-27 at 17:10 -0800, David S. Miller wrote:
> On Fri, 28 Jan 2005 11:29:29 +1100
> Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > No, let's just fix them all.
> 
> In tree, yes.  But leaving the NF_HOOK()/NF_HOOK_THRESH() compat
> macros in there for out-of-tree modules I feel is mandatory, it's
> a major API change.

I'm not so sure.  The hook functions which are registered, sure (ie.
keep the calling convention the same).  But do any external modules use
NF_HOOK()?  That implies they're writing their own network stack for
some protocol, which I would expect to be uncommon.

Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter
@ 2005-01-28  1:32                                         ` Rusty Russell
  0 siblings, 0 replies; 53+ messages in thread
From: Rusty Russell @ 2005-01-28  1:32 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, bdschuym, bridge, snort2004, gandalf,
	Netfilter development mailing list, dwmw2, ak, shemminger

On Thu, 2005-01-27 at 17:10 -0800, David S. Miller wrote:
> On Fri, 28 Jan 2005 11:29:29 +1100
> Rusty Russell <rusty@rustcorp.com.au> wrote:
> 
> > No, let's just fix them all.
> 
> In tree, yes.  But leaving the NF_HOOK()/NF_HOOK_THRESH() compat
> macros in there for out-of-tree modules I feel is mandatory, it's
> a major API change.

I'm not so sure.  The hook functions which are registered, sure (ie.
keep the calling convention the same).  But do any external modules use
NF_HOOK()?  That implies they're writing their own network stack for
some protocol, which I would expect to be uncommon.

Rusty.
-- 
A bad analogy is like a leaky screwdriver -- Richard Braakman


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH/RFC] Reduce call chain length in netfilter
  2005-01-28  1:32                                         ` [Bridge] " Rusty Russell
@ 2005-01-28  1:35                                           ` Patrick McHardy
  -1 siblings, 0 replies; 53+ messages in thread
From: Patrick McHardy @ 2005-01-28  1:35 UTC (permalink / raw)
  To: Rusty Russell
  Cc: David S. Miller, bdschuym, netdev,
	Netfilter development mailing list, snort2004, ak, bridge,
	gandalf, dwmw2, shemminger

Rusty Russell wrote:

>On Thu, 2005-01-27 at 17:10 -0800, David S. Miller wrote:
>
>>In tree, yes.  But leaving the NF_HOOK()/NF_HOOK_THRESH() compat
>>macros in there for out-of-tree modules I feel is mandatory, it's
>>a major API change.
>>
>
>I'm not so sure.  The hook functions which are registered, sure (ie.
>keep the calling convention the same).  But do any external modules use
>NF_HOOK()?  That implies they're writing their own network stack for
>some protocol, which I would expect to be uncommon.
>
Freeswan comes to mind. But I guess one more #ifdef can't hurt :)

Regards
Patrick

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Bridge] Re: [PATCH/RFC] Reduce call chain length in netfilter
@ 2005-01-28  1:35                                           ` Patrick McHardy
  0 siblings, 0 replies; 53+ messages in thread
From: Patrick McHardy @ 2005-01-28  1:35 UTC (permalink / raw)
  To: Rusty Russell
  Cc: netdev, bdschuym, bridge, snort2004, gandalf,
	Netfilter development mailing list, dwmw2, ak, shemminger,
	David S. Miller

Rusty Russell wrote:

>On Thu, 2005-01-27 at 17:10 -0800, David S. Miller wrote:
>
>>In tree, yes.  But leaving the NF_HOOK()/NF_HOOK_THRESH() compat
>>macros in there for out-of-tree modules I feel is mandatory, it's
>>a major API change.
>>
>
>I'm not so sure.  The hook functions which are registered, sure (ie.
>keep the calling convention the same).  But do any external modules use
>NF_HOOK()?  That implies they're writing their own network stack for
>some protocol, which I would expect to be uncommon.
>
Freeswan comes to mind. But I guess one more #ifdef can't hurt :)

Regards
Patrick


^ permalink raw reply	[flat|nested] 53+ messages in thread

* do_IRQ: stack overflow: 872..
@ 2004-12-18  6:27 Crazy AMD K7
  0 siblings, 0 replies; 53+ messages in thread
From: Crazy AMD K7 @ 2004-12-18  6:27 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 15957 bytes --]

Hi!
I have found a few days ago strange messages in /var/log/messages
More than 10 times there was do_IRQ: stack overflow: (nimber).... followed
with code. If need I can send all this data. I have run
ksymoops with only first 3 cases. Here is the first, the second and
the third are in attachment.
After that oopses my system continued to work.

uname uname -a
Linux linux 2.4.28 #2 Втр Ноя 30 15:43:35 MSK 2004 i686 unknown
gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-113)
I have applies ebtables_brnf patch (http://bridge.sf.net) and a
small add to it. The problem occur a few days later I have installed a
new hdd.


My system works as a bridge.
Processor AMDK7-900
RAM 512Mb
HDD IDE 200+200Gb
NIC Compex RE100TX(rtl8139 chipset)


If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux linux 2.4.28 #2 Втр Ноя 30 15:43:35 MSK 2004 i686 unknown

Gnu C                  2.96
Gnu make               3.79.1
util-linux             2.11n
mount                  2.11n
modutils               2.4.18
e2fsprogs              1.27
reiserfsprogs          3.x.0j
quota-tools            3.01.
PPP                    2.4.1
Linux C Library        2.2.5
Dynamic linker (ldd)   2.2.5
Procps                 2.0.7
Net-tools              1.60
Console-tools          0.3.3
Sh-utils               2.0.11
Modules Loaded


ksymoops 2.4.4 on i686 2.4.28.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.28/ (default)
     -m /boot/System.map-2.4.28-2 (specified)

No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod file?
Dec 14 21:23:49 localhost kernel: c0262414 00000368 00000002 c0325640 00000028 c03256f0 c010a508 00000002 
Dec 14 21:23:49 localhost kernel:        00000028 000001f3 c0325640 00000028 c03256f0 00000079 00000018 00000018 
Dec 14 21:23:49 localhost kernel:        ffffff0a c01c17ba 00000010 00000286 c01d0025 00000079 000001f3 00000028 
Dec 14 21:23:49 localhost kernel: Call Trace:    [<c010a508>] [<c01c17ba>] [<c01d0025>] [<c01c9c0d>] [<c01c9ee2>]
Dec 14 21:23:49 localhost kernel:   [<c01ca35f>] [<c01ce520>] [<c0108079>] [<c0108208>] [<c010a508>] [<c02576bb>]
Dec 14 21:23:49 localhost kernel:   [<c0208dc8>] [<c020777b>] [<c020790d>] [<c01be170>] [<c0108208>] [<c010822c>]
Dec 14 21:23:49 localhost kernel:   [<c020afd5>] [<c02530e0>] [<c02146fc>] [<c020b1de>] [<c02530e0>] [<c024642c>]
Dec 14 21:23:49 localhost kernel:   [<c0253199>] [<c02130b3>] [<c02530e0>] [<c02530e0>] [<c021342e>] [<c0213460>]
Dec 14 21:23:49 localhost kernel:   [<c0253199>] [<c02130b3>] [<c0256e10>] [<c02530e0>] [<c0244098>] [<c02530e0>]
Dec 14 21:23:49 localhost kernel:   [<c02130b3>] [<c02530e0>] [<c02530e0>] [<c021342e>] [<c02530e0>] [<c02130b3>]
Dec 14 21:23:49 localhost kernel:   [<c02531b0>] [<c02531e9>] [<c02530e0>] [<c0213460>] [<c01be188>] [<c02531b0>]
Dec 14 21:23:49 localhost kernel:   [<c0256cbf>] [<c02531b0>] [<c02146c4>] [<c02531b0>] [<c02130b3>] [<c02531b0>]
Dec 14 21:23:49 localhost kernel:   [<c02531b0>] [<c021342e>] [<c02531b0>] [<c020777b>] [<c020790d>] [<c025322a>]
Dec 14 21:23:49 localhost kernel:   [<c02531b0>] [<c02532ab>] [<c025276a>] [<c020b02e>] [<c025279d>] [<c020b255>]
Dec 14 21:23:49 localhost kernel:   [<c0244098>] [<c021e9e7>] [<c02530e0>] [<c02530e0>] [<c021342e>] [<c021e940>]
Dec 14 21:23:49 localhost kernel:   [<c0256f61>] [<c021e940>] [<c02130b3>] [<c021e940>] [<c021e940>] [<c021342e>]
Dec 14 21:23:49 localhost kernel:   [<c021e940>] [<c0213460>] [<c021d649>] [<c021e940>] [<c02531b0>] [<c02130b3>]
Dec 14 21:23:49 localhost kernel:   [<c02531b0>] [<c021eb60>] [<c0246498>] [<c021ea40>] [<c0256f61>] [<c021ea40>]
Dec 14 21:23:49 localhost kernel:   [<c02130b3>] [<c021ea40>] [<c021ea40>] [<c021342e>] [<c021ea40>] [<c02146c4>]
Dec 14 21:23:49 localhost kernel:   [<c021dac1>] [<c021ea40>] [<c01be188>] [<c020afd5>] [<c02146c4>] [<c024642c>]
Dec 14 21:23:49 localhost kernel:   [<c0253199>] [<c020b1de>] [<c023296d>] [<c022db29>] [<c02292c1>] [<c0228b18>]
Dec 14 21:23:49 localhost kernel:   [<c022e5e1>] [<c0256e10>] [<c022accb>] [<c022b929>] [<c022bd67>] [<c02530e0>]
Dec 14 21:23:49 localhost kernel:   [<c02130b3>] [<c02530e0>] [<c02530e0>] [<c021342e>] [<c02530e0>] [<c02130b3>]
Dec 14 21:23:49 localhost kernel:   [<c02531b0>] [<c02531e9>] [<c02530e0>] [<c0213460>] [<c02088f5>] [<c0233719>]
Dec 14 21:23:49 localhost kernel:   [<c023366e>] [<c0233bc4>] [<c0244098>] [<c021acb0>] [<c02463ac>] [<c021ad67>]
Dec 14 21:23:49 localhost kernel:   [<c021acb0>] [<c021acb0>] [<c021342e>] [<c021acb0>] [<c0213460>] [<c02560c0>]
Dec 14 21:23:49 localhost kernel:   [<c024642c>] [<c021ade0>] [<c021a93c>] [<c021acb0>] [<c02560c0>] [<c02188fd>]
Dec 14 21:23:49 localhost kernel:   [<c021ade0>] [<c021ade0>] [<c021af4b>] [<c0253b70>] [<c02568c0>] [<c021ade0>]
Dec 14 21:23:49 localhost kernel:   [<c0256e3b>] [<c02130b3>] [<c021ade0>] [<c021ade0>] [<c021342e>] [<c021ade0>]
Dec 14 21:23:49 localhost kernel:   [<c021d649>] [<c021e940>] [<c021ac7d>] [<c021ade0>] [<c0207212>] [<c0250f2c>]
Dec 14 21:23:49 localhost kernel:   [<c0253b70>] [<c02075a1>] [<c020b7df>] [<c020b87d>] [<c020b99c>] [<c0108079>]
Dec 14 21:23:49 localhost kernel:   [<c01176cb>] [<c010823c>] [<c010a508>] [<c015f13b>] [<c0158728>] [<c021ade0>]
Dec 14 21:23:49 localhost kernel:   [<c021ade0>] [<c021af4b>] [<c016320b>] [<c015ec35>] [<c0132e86>] [<c015ec97>]
Dec 14 21:23:49 localhost kernel:   [<c0158ab0>] [<c021ac7d>] [<c016320b>] [<c0158a53>] [<c0158b57>] [<c0158c2c>]
Dec 14 21:23:49 localhost kernel:   [<c015ef55>] [<c014312e>] [<c015384a>] [<c016320b>] [<c015ec35>] [<c015f29d>]
Dec 14 21:23:49 localhost kernel:   [<c0158728>] [<c0158791>] [<c01ce520>] [<c01d01d7>] [<c016320b>] [<c015ec35>]
Dec 14 21:23:49 localhost kernel:   [<c0132e86>] [<c012b094>] [<c015ec97>] [<c016320b>] [<c015ec35>] [<c0155b79>]
Dec 14 21:23:49 localhost kernel:   [<c0155e95>] [<c01297f5>] [<c0132bf8>] [<c0132c09>] [<c0132e86>] [<c0155d03>]
Dec 14 21:23:49 localhost kernel:   [<c015656d>] [<c0132e86>] [<c01330b1>] [<c01566c9>] [<c0133616>] [<c0164f07>]
Dec 14 21:23:49 localhost kernel:   [<c0133ecd>] [<c0156670>] [<c0156b1c>] [<c0156670>] [<c01297f5>] [<c012339d>]
Dec 14 21:23:49 localhost kernel:   [<c0125ec5>] [<c0126320>] [<c015467f>] [<c0131175>] [<c013eea8>] [<c0116feb>]
Dec 14 21:23:49 localhost kernel:   [<c0106cb3>]
Warning (Oops_read): Code line not seen, dumping what data is available

Trace; c010a508 <call_do_IRQ+5/d>
Trace; c01c17ba <ide_outb+a/10>
Trace; c01d0025 <__ide_do_rw_disk+155/520>
Trace; c01c9c0d <ide_start_request+18d/1c0>
Trace; c01c9ee2 <ide_do_request+272/2b0>
Trace; c01ca35f <ide_intr+df/100>
Trace; c01ce520 <ide_dma_intr+0/a0>
Trace; c0108079 <handle_IRQ_event+39/60>
Trace; c0108208 <do_IRQ+88/d0>
Trace; c010a508 <call_do_IRQ+5/d>
Trace; c02576bb <_mmx_memcpy+2b/150>
Trace; c0208dc8 <skb_copy_and_csum_dev+78/d0>
Trace; c020777b <kfree_skbmem+b/60>
Trace; c020790d <__kfree_skb+13d/150>
Trace; c01be170 <rtl8139_start_xmit+50/100>
Trace; c0108208 <do_IRQ+88/d0>
Trace; c010822c <do_IRQ+ac/d0>
Trace; c020afd5 <dev_queue_xmit_nit+15/a0>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c02146fc <qdisc_restart+4c/d0>
Trace; c020b1de <dev_queue_xmit+fe/260>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c024642c <ipt_route_hook+1c/20>
Trace; c0253199 <br_dev_queue_push_xmit+b9/d0>
Trace; c02130b3 <nf_iterate+33/90>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c021342e <nf_hook_slow+ae/140>
Trace; c0213460 <nf_hook_slow+e0/140>
Trace; c0253199 <br_dev_queue_push_xmit+b9/d0>
Trace; c02130b3 <nf_iterate+33/90>
Trace; c0256e10 <br_nf_post_routing+140/150>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c0244098 <ipt_do_table+308/450>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c02130b3 <nf_iterate+33/90>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c021342e <nf_hook_slow+ae/140>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c02130b3 <nf_iterate+33/90>
Trace; c02531b0 <br_forward_finish+0/40>
Trace; c02531e9 <br_forward_finish+39/40>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c0213460 <nf_hook_slow+e0/140>
Trace; c01be188 <rtl8139_start_xmit+68/100>
Trace; c02531b0 <br_forward_finish+0/40>
Trace; c0256cbf <br_nf_local_out+19f/1b0>
Trace; c02531b0 <br_forward_finish+0/40>
Trace; c02146c4 <qdisc_restart+14/d0>
Trace; c02531b0 <br_forward_finish+0/40>
Trace; c02130b3 <nf_iterate+33/90>
Trace; c02531b0 <br_forward_finish+0/40>
Trace; c02531b0 <br_forward_finish+0/40>
Trace; c021342e <nf_hook_slow+ae/140>
Trace; c02531b0 <br_forward_finish+0/40>
Trace; c020777b <kfree_skbmem+b/60>
Trace; c020790d <__kfree_skb+13d/150>
Trace; c025322a <__br_deliver+3a/40>
Trace; c02531b0 <br_forward_finish+0/40>
Trace; c02532ab <br_deliver+2b/50>
Trace; c025276a <__br_dev_xmit+6a/90>
Trace; c020b02e <dev_queue_xmit_nit+6e/a0>
Trace; c025279d <br_dev_xmit+d/10>
Trace; c020b255 <dev_queue_xmit+175/260>
Trace; c0244098 <ipt_do_table+308/450>
Trace; c021e9e7 <ip_finish_output2+a7/100>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c021342e <nf_hook_slow+ae/140>
Trace; c021e940 <ip_finish_output2+0/100>
Trace; c0256f61 <ip_sabotage_out+111/130>
Trace; c021e940 <ip_finish_output2+0/100>
Trace; c02130b3 <nf_iterate+33/90>
Trace; c021e940 <ip_finish_output2+0/100>
Trace; c021e940 <ip_finish_output2+0/100>
Trace; c021342e <nf_hook_slow+ae/140>
Trace; c021e940 <ip_finish_output2+0/100>
Trace; c0213460 <nf_hook_slow+e0/140>
Trace; c021d649 <ip_output+139/150>
Trace; c021e940 <ip_finish_output2+0/100>
Trace; c02531b0 <br_forward_finish+0/40>
Trace; c02130b3 <nf_iterate+33/90>
Trace; c02531b0 <br_forward_finish+0/40>
Trace; c021eb60 <ip_queue_xmit2+120/1f7>
Trace; c0246498 <ipt_local_hook+68/b0>
Trace; c021ea40 <ip_queue_xmit2+0/1f7>
Trace; c0256f61 <ip_sabotage_out+111/130>
Trace; c021ea40 <ip_queue_xmit2+0/1f7>
Trace; c02130b3 <nf_iterate+33/90>
Trace; c021ea40 <ip_queue_xmit2+0/1f7>
Trace; c021ea40 <ip_queue_xmit2+0/1f7>
Trace; c021342e <nf_hook_slow+ae/140>
Trace; c021ea40 <ip_queue_xmit2+0/1f7>
Trace; c02146c4 <qdisc_restart+14/d0>
Trace; c021dac1 <ip_queue_xmit+461/4b0>
Trace; c021ea40 <ip_queue_xmit2+0/1f7>
Trace; c01be188 <rtl8139_start_xmit+68/100>
Trace; c020afd5 <dev_queue_xmit_nit+15/a0>
Trace; c02146c4 <qdisc_restart+14/d0>
Trace; c024642c <ipt_route_hook+1c/20>
Trace; c0253199 <br_dev_queue_push_xmit+b9/d0>
Trace; c020b1de <dev_queue_xmit+fe/260>
Trace; c023296d <tcp_v4_send_check+6d/b0>
Trace; c022db29 <tcp_transmit_skb+549/670>
Trace; c02292c1 <tcp_clean_rtx_queue+221/320>
Trace; c0228b18 <tcp_fastretrans_alert+468/620>
Trace; c022e5e1 <tcp_write_xmit+151/290>
Trace; c0256e10 <br_nf_post_routing+140/150>
Trace; c022accb <tcp_data_queue+3bb/970>
Trace; c022b929 <__tcp_data_snd_check+49/d0>
Trace; c022bd67 <tcp_rcv_established+137/8d0>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c02130b3 <nf_iterate+33/90>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c021342e <nf_hook_slow+ae/140>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c02130b3 <nf_iterate+33/90>
Trace; c02531b0 <br_forward_finish+0/40>
Trace; c02531e9 <br_forward_finish+39/40>
Trace; c02530e0 <br_dev_queue_push_xmit+0/d0>
Trace; c0213460 <nf_hook_slow+e0/140>
Trace; c02088f5 <skb_checksum+45/240>
Trace; c0233719 <tcp_v4_do_rcv+29/100>
Trace; c023366e <tcp_v4_checksum_init+7e/100>
Trace; c0233bc4 <tcp_v4_rcv+3d4/620>
Trace; c0244098 <ipt_do_table+308/450>
Trace; c021acb0 <ip_local_deliver_finish+0/130>
Trace; c02463ac <ipt_hook+1c/20>
Trace; c021ad67 <ip_local_deliver_finish+b7/130>
Trace; c021acb0 <ip_local_deliver_finish+0/130>
Trace; c021acb0 <ip_local_deliver_finish+0/130>
Trace; c021342e <nf_hook_slow+ae/140>
Trace; c021acb0 <ip_local_deliver_finish+0/130>
Trace; c0213460 <nf_hook_slow+e0/140>
Trace; c02560c0 <br_nf_pre_routing_finish+0/1f0>
Trace; c024642c <ipt_route_hook+1c/20>
Trace; c021ade0 <ip_rcv_finish+0/1a0>
Trace; c021a93c <ip_local_deliver+17c/190>
Trace; c021acb0 <ip_local_deliver_finish+0/130>
Trace; c02560c0 <br_nf_pre_routing_finish+0/1f0>
Trace; c02188fd <ip_route_input+3d/130>
Trace; c021ade0 <ip_rcv_finish+0/1a0>
Trace; c021ade0 <ip_rcv_finish+0/1a0>
Trace; c021af4b <ip_rcv_finish+16b/1a0>
Trace; c0253b70 <br_handle_frame_finish+0/110>
Trace; c02568c0 <br_nf_pre_routing+330/350>
Trace; c021ade0 <ip_rcv_finish+0/1a0>
Trace; c0256e3b <ip_sabotage_in+1b/30>
Trace; c02130b3 <nf_iterate+33/90>
Trace; c021ade0 <ip_rcv_finish+0/1a0>
Trace; c021ade0 <ip_rcv_finish+0/1a0>
Trace; c021342e <nf_hook_slow+ae/140>
Trace; c021ade0 <ip_rcv_finish+0/1a0>
Trace; c021d649 <ip_output+139/150>
Trace; c021e940 <ip_finish_output2+0/100>
Trace; c021ac7d <ip_rcv+32d/360>
Trace; c021ade0 <ip_rcv_finish+0/1a0>
Trace; c0207212 <sock_def_readable+22/50>
Trace; c0250f2c <packet_rcv+ec/260>
Trace; c0253b70 <br_handle_frame_finish+0/110>
Trace; c02075a1 <alloc_skb+d1/190>
Trace; c020b7df <netif_receive_skb+17f/1b0>
Trace; c020b87d <process_backlog+6d/120>
Trace; c020b99c <net_rx_action+6c/100>
Trace; c0108079 <handle_IRQ_event+39/60>
Trace; c01176cb <do_softirq+4b/90>
Trace; c010823c <do_IRQ+bc/d0>
Trace; c010a508 <call_do_IRQ+5/d>
Trace; c015f13b <journal_dirty_metadata+1b/1a0>
Trace; c0158728 <ext3_do_update_inode+328/3c0>
Trace; c021ade0 <ip_rcv_finish+0/1a0>
Trace; c021ade0 <ip_rcv_finish+0/1a0>
Trace; c021af4b <ip_rcv_finish+16b/1a0>
Trace; c016320b <journal_cancel_revoke+fb/170>
Trace; c015ec35 <do_get_write_access+4b5/4e0>
Trace; c0132e86 <bread+16/80>
Trace; c015ec97 <journal_get_write_access+37/50>
Trace; c0158ab0 <ext3_reserve_inode_write+30/b0>
Trace; c021ac7d <ip_rcv+32d/360>
Trace; c016320b <journal_cancel_revoke+fb/170>
Trace; c0158a53 <ext3_mark_iloc_dirty+23/50>
Trace; c0158b57 <ext3_mark_inode_dirty+27/40>
Trace; c0158c2c <ext3_dirty_inode+bc/100>
Trace; c015ef55 <journal_get_undo_access+f5/120>
Trace; c014312e <__mark_inode_dirty+2e/90>
Trace; c015384a <ext3_new_block+aa/830>
Trace; c016320b <journal_cancel_revoke+fb/170>
Trace; c015ec35 <do_get_write_access+4b5/4e0>
Trace; c015f29d <journal_dirty_metadata+17d/1a0>
Trace; c0158728 <ext3_do_update_inode+328/3c0>
Trace; c0158791 <ext3_do_update_inode+391/3c0>
Trace; c01ce520 <ide_dma_intr+0/a0>
Trace; c01d01d7 <__ide_do_rw_disk+307/520>
Trace; c016320b <journal_cancel_revoke+fb/170>
Trace; c015ec35 <do_get_write_access+4b5/4e0>
Trace; c0132e86 <bread+16/80>
Trace; c012b094 <__alloc_pages+74/2c0>
Trace; c015ec97 <journal_get_write_access+37/50>
Trace; c016320b <journal_cancel_revoke+fb/170>
Trace; c015ec35 <do_get_write_access+4b5/4e0>
Trace; c0155b79 <ext3_alloc_block+19/20>
Trace; c0155e95 <ext3_alloc_branch+55/2d0>
Trace; c01297f5 <lru_cache_add+65/70>
Trace; c0132bf8 <getblk+28/60>
Trace; c0132c09 <getblk+39/60>
Trace; c0132e86 <bread+16/80>
Trace; c0155d03 <ext3_get_branch+53/d0>
Trace; c015656d <ext3_get_block_handle+1bd/2c0>
Trace; c0132e86 <bread+16/80>
Trace; c01330b1 <create_buffers+61/f0>
Trace; c01566c9 <ext3_get_block+59/60>
Trace; c0133616 <__block_prepare_write+e6/300>
Trace; c0164f07 <__jbd_kmalloc+27/a0>
Trace; c0133ecd <block_prepare_write+1d/40>
Trace; c0156670 <ext3_get_block+0/60>
Trace; c0156b1c <ext3_prepare_write+7c/120>
Trace; c0156670 <ext3_get_block+0/60>
Trace; c01297f5 <lru_cache_add+65/70>
Trace; c012339d <add_to_page_cache_unique+9d/b0>
Trace; c0125ec5 <do_generic_file_write+255/3e0>
Trace; c0126320 <generic_file_write+f0/110>
Trace; c015467f <ext3_file_write+1f/b0>
Trace; c0131175 <sys_write+95/f0>
Trace; c013eea8 <sys_select+468/480>
Trace; c0116feb <sys_gettimeofday+1b/a0>
Trace; c0106cb3 <system_call+33/38>


2 warnings issued.  Results may not be reliable.

Thank you,
Pasha

[-- Attachment #2: sto2.out.txt.gz --]
[-- Type: application/x-gzip, Size: 2983 bytes --]

[-- Attachment #3: sto3.out.txt.gz --]
[-- Type: application/x-gzip, Size: 3017 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2005-01-28  1:35 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1131604877.20041218092730@mail.ru.suse.lists.linux.kernel>
2004-12-18  7:50 ` do_IRQ: stack overflow: 872 Andi Kleen
2004-12-18 11:12   ` Bart De Schuymer
2004-12-18 11:14     ` Andi Kleen
2004-12-18 11:51       ` Bart De Schuymer
2004-12-18 13:53         ` Andi Kleen
2004-12-18 16:07           ` Re[2]: " Crazy AMD K7
2004-12-18 16:46             ` Bart De Schuymer
2005-01-07 17:05   ` David Woodhouse
2005-01-07 18:00     ` Stephen Hemminger
2005-01-07 18:00       ` [Bridge] " Stephen Hemminger
2005-01-07 18:06       ` David Woodhouse
2005-01-07 18:06         ` [Bridge] " David Woodhouse
2005-01-07 21:27       ` Bart De Schuymer
2005-01-07 21:27         ` [Bridge] " Bart De Schuymer
2005-01-18 21:57         ` David S. Miller
2005-01-18 21:57           ` [Bridge] " David S. Miller
2005-01-22 22:30           ` [PATCH/RFC] Reduce call chain length in netfilter (was: Re: do_IRQ: stack overflow: 872..) Bart De Schuymer
2005-01-22 22:30             ` [Bridge] " Bart De Schuymer
2005-01-22 23:22             ` Martin Josefsson
2005-01-22 23:22               ` [Bridge] " Martin Josefsson
2005-01-23 12:40               ` Bart De Schuymer
2005-01-23 12:40                 ` [Bridge] " Bart De Schuymer
2005-01-23 16:08                 ` Martin Josefsson
2005-01-23 16:08                   ` [Bridge] " Martin Josefsson
2005-01-26  6:05                   ` David S. Miller
2005-01-26  6:05                     ` [Bridge] " David S. Miller
2005-01-26  9:08                     ` Bart De Schuymer
2005-01-26  9:08                       ` [Bridge] " Bart De Schuymer
2005-01-26 23:49                       ` [PATCH/RFC] Reduce call chain length in netfilter Patrick McHardy
2005-01-26 23:49                         ` [Bridge] " Patrick McHardy
2005-01-27  7:18                         ` David S. Miller
2005-01-27  7:18                           ` [Bridge] " David S. Miller
2005-01-27 17:50                           ` Patrick McHardy
2005-01-27 17:50                             ` [Bridge] " Patrick McHardy
2005-01-27 19:47                             ` David S. Miller
2005-01-27 19:47                               ` [Bridge] " David S. Miller
2005-01-27 21:16                               ` Bart De Schuymer
2005-01-27 21:16                                 ` [Bridge] " Bart De Schuymer
2005-01-27 22:48                               ` Patrick McHardy
2005-01-27 22:48                                 ` [Bridge] " Patrick McHardy
2005-01-27 23:24                                 ` David S. Miller
2005-01-27 23:24                                   ` [Bridge] " David S. Miller
2005-01-28  0:08                                   ` Patrick McHardy
2005-01-28  0:08                                     ` [Bridge] " Patrick McHardy
2005-01-28  0:29                                   ` Rusty Russell
2005-01-28  0:29                                     ` [Bridge] " Rusty Russell
2005-01-28  1:10                                     ` David S. Miller
2005-01-28  1:10                                       ` [Bridge] " David S. Miller
2005-01-28  1:32                                       ` Rusty Russell
2005-01-28  1:32                                         ` [Bridge] " Rusty Russell
2005-01-28  1:35                                         ` Patrick McHardy
2005-01-28  1:35                                           ` [Bridge] " Patrick McHardy
2004-12-18  6:27 do_IRQ: stack overflow: 872 Crazy AMD K7

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.