linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ctnetlink loop
@ 2010-12-03 13:39 Holger Eitzenberger
  2010-12-03 13:58 ` Holger Eitzenberger
  2010-12-09 10:56 ` Pablo Neira Ayuso
  0 siblings, 2 replies; 5+ messages in thread
From: Holger Eitzenberger @ 2010-12-03 13:39 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, LKML

[-- Attachment #1: Type: text/plain, Size: 1349 bytes --]

Hi,

I see a problem with how ctnetlink GET requests are being
processed in the kernel (2.6.32.24) under high load.

Initially I saw this problem on a large performance testing
system when getting HTTP proxy performance numbers, but lately
there have been two reports on large customers boxes (both
many-core with 10G NICs).

The sympton is Netlink looping around nfnetlink_rcv_msg(),
which is just because netlink_unicast() came back with -EAGAIN
when trying to write the newly created Netlink skb to the SK
receive buffer in ctnetlink_get_conntrack().  In this case a
(possibly) infinit loop is entered.  Mostly infinit in fact in
case the userland party trying to receive those messages may
be stuck in the sendmsg() call, being unable to read anything
if being single threaded.

I tried to reproduce several times, a few times the loop
disappeared and the box proceeded normally after some time.
I have no explanation for this.

The attached patch tries to solve it by simple not trying
again to netlink_unicast() the reply skb and just fail with
-ENOBUFS.  The reasoning is that at the point a Netlink overrun
is observed it seems counter intuitive to insist on sending
one more Netlink message.

I checked for possible side effects to other Netlink requests,
please check.

The patch applies to net-next-2.6.

Feedback appreciated.

 /holger


[-- Attachment #2: nfnl-fix.diff --]
[-- Type: text/x-diff, Size: 1696 bytes --]

nfnetlink: avoid unbound loop on busy Netlink socket

I see a problem with how ctnetlink GET requests are being
processed in the kernel (2.6.32.24) under high load.

The sympton is Netlink looping around nfnetlink_rcv_msg(), which
is just because netlink_unicast() came back with EAGAIN when
trying to write the newly created Netlink skb to the SK receive
buffer in ctnetlink_get_conntrack().  In this case a (possibly)
infinit loop is entered.  Mostly infinit I think in case the
userland party trying to receive those messages may be stuck in
the sendmsg() call, being unable to read anything if being single
threaded.

I tried to reproduce several times, a few times the loop
disappeared and the box proceeded normally after some minutes.
I have no explanation for this.

The attached patch tries to solve it by simple not trying again
to netlink_unicast() the reply skb and just fail with -ENOBUFS.
The reasoning is that at the point a Netlink overrun is detected
it seems counter intuitive to insist on sending one more Netlink
message.

Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>

Index: net-next-2.6/net/netfilter/nfnetlink.c
===================================================================
--- net-next-2.6.orig/net/netfilter/nfnetlink.c	2010-12-03 14:33:32.000000000 +0100
+++ net-next-2.6/net/netfilter/nfnetlink.c	2010-12-03 14:34:21.000000000 +0100
@@ -138,7 +138,6 @@
 		return 0;
 
 	type = nlh->nlmsg_type;
-replay:
 	ss = nfnetlink_get_subsys(type);
 	if (!ss) {
 #ifdef CONFIG_MODULES
@@ -169,7 +168,7 @@
 
 		err = nc->call(net->nfnl, skb, nlh, (const struct nlattr **)cda);
 		if (err == -EAGAIN)
-			goto replay;
+			err = -ENOBUFS;
 		return err;
 	}
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ctnetlink loop
  2010-12-03 13:39 ctnetlink loop Holger Eitzenberger
@ 2010-12-03 13:58 ` Holger Eitzenberger
  2010-12-09 10:56 ` Pablo Neira Ayuso
  1 sibling, 0 replies; 5+ messages in thread
From: Holger Eitzenberger @ 2010-12-03 13:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, LKML

[changed netfilter-devel address]

I is even stuck for O_NONBLOCK sockets, which may be due to nfnl_mutex
being held in the receive path.

 /holger


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ctnetlink loop
  2010-12-03 13:39 ctnetlink loop Holger Eitzenberger
  2010-12-03 13:58 ` Holger Eitzenberger
@ 2010-12-09 10:56 ` Pablo Neira Ayuso
  2010-12-09 15:23   ` Holger Eitzenberger
  2010-12-10 22:01   ` David Miller
  1 sibling, 2 replies; 5+ messages in thread
From: Pablo Neira Ayuso @ 2010-12-09 10:56 UTC (permalink / raw)
  To: netfilter-devel, netdev, LKML; +Cc: David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]

Sorry, I finally found your email reporting this:

> nfnetlink: avoid unbound loop on busy Netlink socket
> 
> I see a problem with how ctnetlink GET requests are being
> processed in the kernel (2.6.32.24) under high load.
> 
> The sympton is Netlink looping around nfnetlink_rcv_msg(), which
> is just because netlink_unicast() came back with EAGAIN when
> trying to write the newly created Netlink skb to the SK receive
> buffer in ctnetlink_get_conntrack().  In this case a (possibly)
> infinit loop is entered.  Mostly infinit I think in case the
> userland party trying to receive those messages may be stuck in
> the sendmsg() call, being unable to read anything if being single
> threaded.
> 
> I tried to reproduce several times, a few times the loop
> disappeared and the box proceeded normally after some minutes.
> I have no explanation for this.
> 
> The attached patch tries to solve it by simple not trying again
> to netlink_unicast() the reply skb and just fail with -ENOBUFS.
> The reasoning is that at the point a Netlink overrun is detected
> it seems counter intuitive to insist on sending one more Netlink
> message.

We still need EAGAIN, and it doesn't necessarily means ENOBUFS for the
general case in nfnetlink.

The following patch covers the case that you're reporting.

[-- Attachment #2: f.patch --]
[-- Type: text/x-patch, Size: 1164 bytes --]

netfilter: ctnetlink: fix loop in ctnetlink_get_conntrack()

From: Pablo Neira Ayuso <pablo@netfilter.org>

This patch fixes a loop in ctnetlink_get_conntrack() that can be
triggered if you use the same socket to receive events and to
perform a GET operation. Under heavy load, netlink_unicast()
may return -EAGAIN, this error code is reserved in nfnetlink for
the module load-on-demand. Instead, we return -ENOBUFS which is
the appropriate error code that has to be propagated to
user-space.

Reported-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_netlink.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index b729ace..a84fa6f 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -973,7 +973,8 @@ ctnetlink_get_conntrack(struct sock *ctnl, struct sk_buff *skb,
 free:
 	kfree_skb(skb2);
 out:
-	return err;
+	/* this avoids a loop in nfnetlink. */
+	return err == -EAGAIN ? -ENOBUFS : err;
 }
 
 #ifdef CONFIG_NF_NAT_NEEDED

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: ctnetlink loop
  2010-12-09 10:56 ` Pablo Neira Ayuso
@ 2010-12-09 15:23   ` Holger Eitzenberger
  2010-12-10 22:01   ` David Miller
  1 sibling, 0 replies; 5+ messages in thread
From: Holger Eitzenberger @ 2010-12-09 15:23 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, netdev, LKML, David S. Miller

> Sorry, I finally found your email reporting this:
> 
> > nfnetlink: avoid unbound loop on busy Netlink socket

Fair enough, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ctnetlink loop
  2010-12-09 10:56 ` Pablo Neira Ayuso
  2010-12-09 15:23   ` Holger Eitzenberger
@ 2010-12-10 22:01   ` David Miller
  1 sibling, 0 replies; 5+ messages in thread
From: David Miller @ 2010-12-10 22:01 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev, linux-kernel

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Thu, 09 Dec 2010 11:56:13 +0100

> netfilter: ctnetlink: fix loop in ctnetlink_get_conntrack()
> 
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> 
> This patch fixes a loop in ctnetlink_get_conntrack() that can be
> triggered if you use the same socket to receive events and to
> perform a GET operation. Under heavy load, netlink_unicast()
> may return -EAGAIN, this error code is reserved in nfnetlink for
> the module load-on-demand. Instead, we return -ENOBUFS which is
> the appropriate error code that has to be propagated to
> user-space.
> 
> Reported-by: Holger Eitzenberger <holger@eitzenberger.org>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Since Patrick seems to be inactive I have applied this directly
to net-2.6, thanks guys!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-12-10 22:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-03 13:39 ctnetlink loop Holger Eitzenberger
2010-12-03 13:58 ` Holger Eitzenberger
2010-12-09 10:56 ` Pablo Neira Ayuso
2010-12-09 15:23   ` Holger Eitzenberger
2010-12-10 22:01   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).