All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/8] net: add limit for socket backlog
@ 2010-03-03  6:35 Zhu Yi
  2010-03-03  6:35 ` [PATCH 2/8] dccp: use limited " Zhu Yi
  2010-03-03  6:54 ` [PATCH 1/8] net: add limit for " Eric Dumazet
  0 siblings, 2 replies; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  6:35 UTC (permalink / raw)
  To: netdev
  Cc: Zhu Yi, David Miller, Arnaldo Carvalho de Melo,
	Pekka Savola (ipv6),
	Patrick McHardy, Vlad Yasevich, Sridhar Samudrala, Per Liden,
	Jon Maloy, Allan Stephens, Andrew Hendry, Eric Dumazet

We got system OOM while running some UDP netperf testing on the loopback
device. The case is multiple senders sent stream UDP packets to a single
receiver via loopback on local host. Of course, the receiver is not able
to handle all the packets in time. But we surprisingly found that these
packets were not discarded due to the receiver's sk->sk_rcvbuf limit.
Instead, they are kept queuing to sk->sk_backlog and finally ate up all
the memory. We believe this is a secure hole that a none privileged user
can crash the system.

The root cause for this problem is, when the receiver is doing
__release_sock() (i.e. after userspace recv, kernel udp_recvmsg ->
skb_free_datagram_locked -> release_sock), it moves skbs from backlog to
sk_receive_queue with the softirq enabled. In the above case, multiple
busy senders will almost make it an endless loop. The skbs in the
backlog end up eat all the system memory.

The issue is not only for UDP. Any protocols using socket backlog is
potentially affected. The patch adds limit for socket backlog so that
the backlog size cannot be expanded endlessly.

Reported-by: Alex Shi <alex.shi@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Per Liden <per.liden@ericsson.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 include/net/sock.h |   17 +++++++++++++++--
 net/core/sock.c    |   15 +++++++++++++--
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 6cb1676..847119a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -253,6 +253,8 @@ struct sock {
 	struct {
 		struct sk_buff *head;
 		struct sk_buff *tail;
+		int len;
+		int limit;
 	} sk_backlog;
 	wait_queue_head_t	*sk_sleep;
 	struct dst_entry	*sk_dst_cache;
@@ -589,8 +591,8 @@ static inline int sk_stream_memory_free(struct sock *sk)
 	return sk->sk_wmem_queued < sk->sk_sndbuf;
 }
 
-/* The per-socket spinlock must be held here. */
-static inline void sk_add_backlog(struct sock *sk, struct sk_buff *skb)
+/* OOB backlog add */
+static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
 	if (!sk->sk_backlog.tail) {
 		sk->sk_backlog.head = sk->sk_backlog.tail = skb;
@@ -601,6 +603,17 @@ static inline void sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 	skb->next = NULL;
 }
 
+/* The per-socket spinlock must be held here. */
+static inline int sk_add_backlog(struct sock *sk, struct sk_buff *skb)
+{
+	if (sk->sk_backlog.len >= max(sk->sk_backlog.limit, sk->sk_rcvbuf >> 1))
+		return -ENOBUFS;
+
+	__sk_add_backlog(sk, skb);
+	sk->sk_backlog.len += skb->truesize;
+	return 0;
+}
+
 static inline int sk_backlog_rcv(struct sock *sk, struct sk_buff *skb)
 {
 	return sk->sk_backlog_rcv(sk, skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index fcd397a..fa042bc 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -340,8 +340,12 @@ int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested)
 		rc = sk_backlog_rcv(sk, skb);
 
 		mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
-	} else
-		sk_add_backlog(sk, skb);
+	} else if (sk_add_backlog(sk, skb)) {
+		bh_unlock_sock(sk);
+		atomic_inc(&sk->sk_drops);
+		goto discard_and_relse;
+	}
+
 	bh_unlock_sock(sk);
 out:
 	sock_put(sk);
@@ -1139,6 +1142,7 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
 		sock_lock_init(newsk);
 		bh_lock_sock(newsk);
 		newsk->sk_backlog.head	= newsk->sk_backlog.tail = NULL;
+		newsk->sk_backlog.len = 0;
 
 		atomic_set(&newsk->sk_rmem_alloc, 0);
 		/*
@@ -1542,6 +1546,12 @@ static void __release_sock(struct sock *sk)
 
 		bh_lock_sock(sk);
 	} while ((skb = sk->sk_backlog.head) != NULL);
+
+	/*
+	 * Doing the zeroing here guarantee we can not loop forever
+	 * while a wild producer attempts to flood us.
+	 */
+	sk->sk_backlog.len = 0;
 }
 
 /**
@@ -1874,6 +1884,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 	sk->sk_allocation	=	GFP_KERNEL;
 	sk->sk_rcvbuf		=	sysctl_rmem_default;
 	sk->sk_sndbuf		=	sysctl_wmem_default;
+	sk->sk_backlog.limit	=	sk->sk_rcvbuf >> 1;
 	sk->sk_state		=	TCP_CLOSE;
 	sk_set_socket(sk, sock);
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/8] dccp: use limited socket backlog
  2010-03-03  6:35 [PATCH 1/8] net: add limit for socket backlog Zhu Yi
@ 2010-03-03  6:35 ` Zhu Yi
  2010-03-03  6:35   ` [PATCH 3/8] tcp: " Zhu Yi
  2010-03-03  6:56   ` [PATCH 2/8] dccp: " Eric Dumazet
  2010-03-03  6:54 ` [PATCH 1/8] net: add limit for " Eric Dumazet
  1 sibling, 2 replies; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  6:35 UTC (permalink / raw)
  To: netdev; +Cc: Zhu Yi, Arnaldo Carvalho de Melo

Make dccp adapt to the limited socket backlog change.

Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 net/dccp/minisocks.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
index af226a0..0d508c3 100644
--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -254,7 +254,7 @@ int dccp_child_process(struct sock *parent, struct sock *child,
 		 * in main socket hash table and lock on listening
 		 * socket does not protect us more.
 		 */
-		sk_add_backlog(child, skb);
+		__sk_add_backlog(child, skb);
 	}
 
 	bh_unlock_sock(child);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/8] tcp: use limited socket backlog
  2010-03-03  6:35 ` [PATCH 2/8] dccp: use limited " Zhu Yi
@ 2010-03-03  6:35   ` Zhu Yi
  2010-03-03  6:35     ` [PATCH 4/8] udp: " Zhu Yi
  2010-03-03  6:56   ` [PATCH 2/8] dccp: " Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  6:35 UTC (permalink / raw)
  To: netdev
  Cc: Zhu Yi, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	Patrick McHardy

Make tcp adapt to the limited socket backlog change.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 net/ipv4/tcp_ipv4.c      |    6 ++++--
 net/ipv4/tcp_minisocks.c |    2 +-
 net/ipv6/tcp_ipv6.c      |    6 ++++--
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index c3588b4..1915f7d 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1682,8 +1682,10 @@ process:
 			if (!tcp_prequeue(sk, skb))
 				ret = tcp_v4_do_rcv(sk, skb);
 		}
-	} else
-		sk_add_backlog(sk, skb);
+	} else if (sk_add_backlog(sk, skb)) {
+		bh_unlock_sock(sk);
+		goto discard_and_relse;
+	}
 	bh_unlock_sock(sk);
 
 	sock_put(sk);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index f206ee5..4199bc6 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -728,7 +728,7 @@ int tcp_child_process(struct sock *parent, struct sock *child,
 		 * in main socket hash table and lock on listening
 		 * socket does not protect us more.
 		 */
-		sk_add_backlog(child, skb);
+		__sk_add_backlog(child, skb);
 	}
 
 	bh_unlock_sock(child);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 6963a6b..2c378b1 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1740,8 +1740,10 @@ process:
 			if (!tcp_prequeue(sk, skb))
 				ret = tcp_v6_do_rcv(sk, skb);
 		}
-	} else
-		sk_add_backlog(sk, skb);
+	} else if (sk_add_backlog(sk, skb)) {
+		bh_unlock_sock(sk);
+		goto discard_and_relse;
+	}
 	bh_unlock_sock(sk);
 
 	sock_put(sk);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/8] udp: use limited socket backlog
  2010-03-03  6:35   ` [PATCH 3/8] tcp: " Zhu Yi
@ 2010-03-03  6:35     ` Zhu Yi
  2010-03-03  6:35       ` [PATCH 5/8] llc: " Zhu Yi
  0 siblings, 1 reply; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  6:35 UTC (permalink / raw)
  To: netdev
  Cc: Zhu Yi, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	Patrick McHardy

Make udp adapt to the limited socket backlog change.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 net/ipv4/udp.c |    6 ++++--
 net/ipv6/udp.c |   28 ++++++++++++++++++----------
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 608a544..7af756d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1371,8 +1371,10 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	bh_lock_sock(sk);
 	if (!sock_owned_by_user(sk))
 		rc = __udp_queue_rcv_skb(sk, skb);
-	else
-		sk_add_backlog(sk, skb);
+	else if (sk_add_backlog(sk, skb)) {
+		bh_unlock_sock(sk);
+		goto drop;
+	}
 	bh_unlock_sock(sk);
 
 	return rc;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 52b8347..3c0c9c7 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -583,16 +583,20 @@ static void flush_stack(struct sock **stack, unsigned int count,
 			bh_lock_sock(sk);
 			if (!sock_owned_by_user(sk))
 				udpv6_queue_rcv_skb(sk, skb1);
-			else
-				sk_add_backlog(sk, skb1);
+			else if (sk_add_backlog(sk, skb1)) {
+				kfree_skb(skb1);
+				bh_unlock_sock(sk);
+				goto drop;
+			}
 			bh_unlock_sock(sk);
-		} else {
-			atomic_inc(&sk->sk_drops);
-			UDP6_INC_STATS_BH(sock_net(sk),
-					UDP_MIB_RCVBUFERRORS, IS_UDPLITE(sk));
-			UDP6_INC_STATS_BH(sock_net(sk),
-					UDP_MIB_INERRORS, IS_UDPLITE(sk));
+			continue;
 		}
+drop:
+		atomic_inc(&sk->sk_drops);
+		UDP6_INC_STATS_BH(sock_net(sk),
+				UDP_MIB_RCVBUFERRORS, IS_UDPLITE(sk));
+		UDP6_INC_STATS_BH(sock_net(sk),
+				UDP_MIB_INERRORS, IS_UDPLITE(sk));
 	}
 }
 /*
@@ -754,8 +758,12 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	bh_lock_sock(sk);
 	if (!sock_owned_by_user(sk))
 		udpv6_queue_rcv_skb(sk, skb);
-	else
-		sk_add_backlog(sk, skb);
+	else if (sk_add_backlog(sk, skb)) {
+		atomic_inc(&sk->sk_drops);
+		bh_unlock_sock(sk);
+		sock_put(sk);
+		goto discard;
+	}
 	bh_unlock_sock(sk);
 	sock_put(sk);
 	return 0;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/8] llc: use limited socket backlog
  2010-03-03  6:35     ` [PATCH 4/8] udp: " Zhu Yi
@ 2010-03-03  6:35       ` Zhu Yi
  2010-03-03  6:35         ` [PATCH 6/8] sctp: " Zhu Yi
  0 siblings, 1 reply; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  6:35 UTC (permalink / raw)
  To: netdev; +Cc: Zhu Yi, Arnaldo Carvalho de Melo

Make llc adapt to the limited socket backlog change.

Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 net/llc/llc_c_ac.c |    2 +-
 net/llc/llc_conn.c |    3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/llc/llc_c_ac.c b/net/llc/llc_c_ac.c
index 019c780..86d6985 100644
--- a/net/llc/llc_c_ac.c
+++ b/net/llc/llc_c_ac.c
@@ -1437,7 +1437,7 @@ static void llc_process_tmr_ev(struct sock *sk, struct sk_buff *skb)
 			llc_conn_state_process(sk, skb);
 		else {
 			llc_set_backlog_type(skb, LLC_EVENT);
-			sk_add_backlog(sk, skb);
+			__sk_add_backlog(sk, skb);
 		}
 	}
 }
diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
index a8dde9b..a12144d 100644
--- a/net/llc/llc_conn.c
+++ b/net/llc/llc_conn.c
@@ -827,7 +827,8 @@ void llc_conn_handler(struct llc_sap *sap, struct sk_buff *skb)
 	else {
 		dprintk("%s: adding to backlog...\n", __func__);
 		llc_set_backlog_type(skb, LLC_PACKET);
-		sk_add_backlog(sk, skb);
+		if (sk_add_backlog(sk, skb))
+			goto drop_unlock;
 	}
 out:
 	bh_unlock_sock(sk);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/8] sctp: use limited socket backlog
  2010-03-03  6:35       ` [PATCH 5/8] llc: " Zhu Yi
@ 2010-03-03  6:35         ` Zhu Yi
  2010-03-03  6:35           ` [PATCH 7/8] tipc: " Zhu Yi
  0 siblings, 1 reply; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  6:35 UTC (permalink / raw)
  To: netdev; +Cc: Zhu Yi, Vlad Yasevich, Sridhar Samudrala

Make sctp adapt to the limited socket backlog change.

Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 net/sctp/input.c |   14 +++++++++-----
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/sctp/input.c b/net/sctp/input.c
index c0c973e..ca42181 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -75,7 +75,7 @@ static struct sctp_association *__sctp_lookup_association(
 					const union sctp_addr *peer,
 					struct sctp_transport **pt);
 
-static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb);
+static int sctp_add_backlog(struct sock *sk, struct sk_buff *skb);
 
 
 /* Calculate the SCTP checksum of an SCTP packet.  */
@@ -265,8 +265,12 @@ int sctp_rcv(struct sk_buff *skb)
 	}
 
 	if (sock_owned_by_user(sk)) {
+		if (sctp_add_backlog(sk, skb)) {
+			sctp_bh_unlock_sock(sk);
+			sctp_chunk_free(chunk);
+			goto discard_release;
+		}
 		SCTP_INC_STATS_BH(SCTP_MIB_IN_PKT_BACKLOG);
-		sctp_add_backlog(sk, skb);
 	} else {
 		SCTP_INC_STATS_BH(SCTP_MIB_IN_PKT_SOFTIRQ);
 		sctp_inq_push(&chunk->rcvr->inqueue, chunk);
@@ -336,7 +340,7 @@ int sctp_backlog_rcv(struct sock *sk, struct sk_buff *skb)
 		sctp_bh_lock_sock(sk);
 
 		if (sock_owned_by_user(sk)) {
-			sk_add_backlog(sk, skb);
+			__sk_add_backlog(sk, skb);
 			backloged = 1;
 		} else
 			sctp_inq_push(inqueue, chunk);
@@ -362,7 +366,7 @@ done:
 	return 0;
 }
 
-static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
+static int sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
 	struct sctp_chunk *chunk = SCTP_INPUT_CB(skb)->chunk;
 	struct sctp_ep_common *rcvr = chunk->rcvr;
@@ -377,7 +381,7 @@ static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
 	else
 		BUG();
 
-	sk_add_backlog(sk, skb);
+	return sk_add_backlog(sk, skb);
 }
 
 /* Handle icmp frag needed error. */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/8] tipc: use limited socket backlog
  2010-03-03  6:35         ` [PATCH 6/8] sctp: " Zhu Yi
@ 2010-03-03  6:35           ` Zhu Yi
  2010-03-03  6:35             ` [PATCH 8/8] x25: " Zhu Yi
  0 siblings, 1 reply; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  6:35 UTC (permalink / raw)
  To: netdev; +Cc: Zhu Yi, Per Liden, Jon Maloy, Allan Stephens

Make tipc adapt to the limited socket backlog change.

Cc: Per Liden <per.liden@ericsson.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 net/tipc/socket.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 1ea64f0..4b235fc 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1322,8 +1322,10 @@ static u32 dispatch(struct tipc_port *tport, struct sk_buff *buf)
 	if (!sock_owned_by_user(sk)) {
 		res = filter_rcv(sk, buf);
 	} else {
-		sk_add_backlog(sk, buf);
-		res = TIPC_OK;
+		if (sk_add_backlog(sk, buf))
+			res = TIPC_ERR_OVERLOAD;
+		else
+			res = TIPC_OK;
 	}
 	bh_unlock_sock(sk);
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 8/8] x25: use limited socket backlog
  2010-03-03  6:35           ` [PATCH 7/8] tipc: " Zhu Yi
@ 2010-03-03  6:35             ` Zhu Yi
  2010-03-03  7:08               ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  6:35 UTC (permalink / raw)
  To: netdev; +Cc: Zhu Yi, Andrew Hendry

Make x25 adapt to the limited socket backlog change.

Cc: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 net/x25/x25_dev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/x25/x25_dev.c b/net/x25/x25_dev.c
index 3e1efe5..5688123 100644
--- a/net/x25/x25_dev.c
+++ b/net/x25/x25_dev.c
@@ -53,7 +53,7 @@ static int x25_receive_data(struct sk_buff *skb, struct x25_neigh *nb)
 		if (!sock_owned_by_user(sk)) {
 			queued = x25_process_rx_frame(sk, skb);
 		} else {
-			sk_add_backlog(sk, skb);
+			__sk_add_backlog(sk, skb);
 		}
 		bh_unlock_sock(sk);
 		sock_put(sk);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] net: add limit for socket backlog
  2010-03-03  6:35 [PATCH 1/8] net: add limit for socket backlog Zhu Yi
  2010-03-03  6:35 ` [PATCH 2/8] dccp: use limited " Zhu Yi
@ 2010-03-03  6:54 ` Eric Dumazet
  2010-03-03  7:35   ` Zhu Yi
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2010-03-03  6:54 UTC (permalink / raw)
  To: Zhu Yi
  Cc: netdev, David Miller, Arnaldo Carvalho de Melo,
	Pekka Savola (ipv6),
	Patrick McHardy, Vlad Yasevich, Sridhar Samudrala, Per Liden,
	Jon Maloy, Allan Stephens, Andrew Hendry

Le mercredi 03 mars 2010 à 14:35 +0800, Zhu Yi a écrit :
> We got system OOM while running some UDP netperf testing on the loopback
> device. The case is multiple senders sent stream UDP packets to a single
> receiver via loopback on local host. Of course, the receiver is not able
> to handle all the packets in time. But we surprisingly found that these
> packets were not discarded due to the receiver's sk->sk_rcvbuf limit.
> Instead, they are kept queuing to sk->sk_backlog and finally ate up all
> the memory. We believe this is a secure hole that a none privileged user
> can crash the system.
> 
> The root cause for this problem is, when the receiver is doing
> __release_sock() (i.e. after userspace recv, kernel udp_recvmsg ->
> skb_free_datagram_locked -> release_sock), it moves skbs from backlog to
> sk_receive_queue with the softirq enabled. In the above case, multiple
> busy senders will almost make it an endless loop. The skbs in the
> backlog end up eat all the system memory.
> 
> The issue is not only for UDP. Any protocols using socket backlog is
> potentially affected. The patch adds limit for socket backlog so that
> the backlog size cannot be expanded endlessly.
> 
> Reported-by: Alex Shi <alex.shi@intel.com>
> Cc: David Miller <davem@davemloft.net>
> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru
> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
> Cc: Patrick McHardy <kaber@trash.net>
> Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
> Cc: Sridhar Samudrala <sri@us.ibm.com>
> Cc: Per Liden <per.liden@ericsson.com>
> Cc: Jon Maloy <jon.maloy@ericsson.com>
> Cc: Allan Stephens <allan.stephens@windriver.com>
> Cc: Andrew Hendry <andrew.hendry@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Zhu Yi <yi.zhu@intel.com>

Your SOB should be before mine, you are the main author of this patch, I
am a contributor

> ---
>  include/net/sock.h |   17 +++++++++++++++--
>  net/core/sock.c    |   15 +++++++++++++--
>  2 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 6cb1676..847119a 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -253,6 +253,8 @@ struct sock {
>  	struct {
>  		struct sk_buff *head;
>  		struct sk_buff *tail;
> +		int len;
> +		int limit;

This new limit field is really not needed

>  	} sk_backlog;
>  	wait_queue_head_t	*sk_sleep;
>  	struct dst_entry	*sk_dst_cache;
> @@ -589,8 +591,8 @@ static inline int sk_stream_memory_free(struct sock *sk)
>  	return sk->sk_wmem_queued < sk->sk_sndbuf;
>  }
>  
> -/* The per-socket spinlock must be held here. */
> -static inline void sk_add_backlog(struct sock *sk, struct sk_buff *skb)
> +/* OOB backlog add */
> +static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
>  {
>  	if (!sk->sk_backlog.tail) {
>  		sk->sk_backlog.head = sk->sk_backlog.tail = skb;
> @@ -601,6 +603,17 @@ static inline void sk_add_backlog(struct sock *sk, struct sk_buff *skb)
>  	skb->next = NULL;
>  }
>  
> +/* The per-socket spinlock must be held here. */
> +static inline int sk_add_backlog(struct sock *sk, struct sk_buff *skb)
> +{
> +	if (sk->sk_backlog.len >= max(sk->sk_backlog.limit, sk->sk_rcvbuf >> 1))
> +		return -ENOBUFS;
> +
> +	__sk_add_backlog(sk, skb);
> +	sk->sk_backlog.len += skb->truesize;
> +	return 0;
> +}
> +

Ouch, this patch breaks bisection, since all protocols currently ignore
-ENOBUFS value and dont free skb

If you split your work on several patches, you still have to make
resulting kernels usable.

>  static inline int sk_backlog_rcv(struct sock *sk, struct sk_buff *skb)
>  {
>  	return sk->sk_backlog_rcv(sk, skb);
> diff --git a/net/core/sock.c b/net/core/sock.c
> index fcd397a..fa042bc 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -340,8 +340,12 @@ int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested)
>  		rc = sk_backlog_rcv(sk, skb);
>  
>  		mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
> -	} else
> -		sk_add_backlog(sk, skb);
> +	} else if (sk_add_backlog(sk, skb)) {
> +		bh_unlock_sock(sk);
> +		atomic_inc(&sk->sk_drops);
> +		goto discard_and_relse;
> +	}
> +
>  	bh_unlock_sock(sk);
>  out:
>  	sock_put(sk);
> @@ -1139,6 +1142,7 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
>  		sock_lock_init(newsk);
>  		bh_lock_sock(newsk);
>  		newsk->sk_backlog.head	= newsk->sk_backlog.tail = NULL;
> +		newsk->sk_backlog.len = 0;
>  
>  		atomic_set(&newsk->sk_rmem_alloc, 0);
>  		/*
> @@ -1542,6 +1546,12 @@ static void __release_sock(struct sock *sk)
>  
>  		bh_lock_sock(sk);
>  	} while ((skb = sk->sk_backlog.head) != NULL);
> +
> +	/*
> +	 * Doing the zeroing here guarantee we can not loop forever
> +	 * while a wild producer attempts to flood us.
> +	 */
> +	sk->sk_backlog.len = 0;
>  }
>  
>  /**
> @@ -1874,6 +1884,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
>  	sk->sk_allocation	=	GFP_KERNEL;
>  	sk->sk_rcvbuf		=	sysctl_rmem_default;
>  	sk->sk_sndbuf		=	sysctl_wmem_default;
> +	sk->sk_backlog.limit	=	sk->sk_rcvbuf >> 1;

Didnt we agreed to use sk_>rcvbuf << 1  in previous round ?

>  	sk->sk_state		=	TCP_CLOSE;
>  	sk_set_socket(sk, sock);
>  




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/8] dccp: use limited socket backlog
  2010-03-03  6:35 ` [PATCH 2/8] dccp: use limited " Zhu Yi
  2010-03-03  6:35   ` [PATCH 3/8] tcp: " Zhu Yi
@ 2010-03-03  6:56   ` Eric Dumazet
  2010-03-03  7:43     ` Zhu Yi
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2010-03-03  6:56 UTC (permalink / raw)
  To: Zhu Yi; +Cc: netdev, Arnaldo Carvalho de Melo

Le mercredi 03 mars 2010 à 14:35 +0800, Zhu Yi a écrit :
> Make dccp adapt to the limited socket backlog change.
> 
> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> Signed-off-by: Zhu Yi <yi.zhu@intel.com>
> ---
>  net/dccp/minisocks.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
> index af226a0..0d508c3 100644
> --- a/net/dccp/minisocks.c
> +++ b/net/dccp/minisocks.c
> @@ -254,7 +254,7 @@ int dccp_child_process(struct sock *parent, struct sock *child,
>  		 * in main socket hash table and lock on listening
>  		 * socket does not protect us more.
>  		 */
> -		sk_add_backlog(child, skb);
> +		__sk_add_backlog(child, skb);
>  	}
>  
>  	bh_unlock_sock(child);

I dont understand this patch.

You make dccp vulnerable to memory exhaustion, I thought you wanted to
solve this problem.

It should therefore be named "dccp: use unlimited socket backlog"

(And this sounds not so sexy :) )



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] x25: use limited socket backlog
  2010-03-03  6:35             ` [PATCH 8/8] x25: " Zhu Yi
@ 2010-03-03  7:08               ` Eric Dumazet
  2010-03-03 11:38                 ` andrew hendry
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2010-03-03  7:08 UTC (permalink / raw)
  To: Zhu Yi; +Cc: netdev, Andrew Hendry

Le mercredi 03 mars 2010 à 14:35 +0800, Zhu Yi a écrit :
> Make x25 adapt to the limited socket backlog change.
> 
> Cc: Andrew Hendry <andrew.hendry@gmail.com>
> Signed-off-by: Zhu Yi <yi.zhu@intel.com>
> ---
>  net/x25/x25_dev.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/x25/x25_dev.c b/net/x25/x25_dev.c
> index 3e1efe5..5688123 100644
> --- a/net/x25/x25_dev.c
> +++ b/net/x25/x25_dev.c
> @@ -53,7 +53,7 @@ static int x25_receive_data(struct sk_buff *skb, struct x25_neigh *nb)
>  		if (!sock_owned_by_user(sk)) {
>  			queued = x25_process_rx_frame(sk, skb);
>  		} else {
> -			sk_add_backlog(sk, skb);
> +			__sk_add_backlog(sk, skb);
>  		}
>  		bh_unlock_sock(sk);
>  		sock_put(sk);

Please respin your patch the other way 

Ie: let sk_add_backlog(sk, skb) do its previous job (not leaking skbs,
and returning a void status)

Add a new function able to no limit backlog, and returns an error code,
so that caller can free skb and increment SNMP counters accordingly.

Callers MUST test return value, or use another helper that can free the
skb for them.

Name it sk_move_backlog() for example

This will permit you to split the work as you tried.

sk_add_backlog() could be redefined as the helper :

void sk_add_backlog(sk, skb)
{
	if (sk_move_backlog(sk, skb)) {
		kfree_skb(skb);
		atomic_inc(&sk->sk_drops);
	}
	
}

Thanks



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] net: add limit for socket backlog
  2010-03-03  6:54 ` [PATCH 1/8] net: add limit for " Eric Dumazet
@ 2010-03-03  7:35   ` Zhu Yi
  2010-03-03  8:02     ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  7:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David Miller, Arnaldo Carvalho de Melo,
	Pekka Savola (ipv6),
	Patrick McHardy, Vlad Yasevich, Sridhar Samudrala, Per Liden,
	Jon Maloy, Allan Stephens, Andrew Hendry

On Wed, 2010-03-03 at 14:54 +0800, Eric Dumazet wrote:
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 6cb1676..847119a 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -253,6 +253,8 @@ struct sock {
> >  	struct {
> >  		struct sk_buff *head;
> >  		struct sk_buff *tail;
> > +		int len;
> > +		int limit;
> 
> This new limit field is really not needed

sk->sk_rcvbuf can be adjusted by setsockopt from user at run time. The
minimum allowed value is 256. I'm afraid this will break most protocols.

<...>

> Ouch, this patch breaks bisection, since all protocols currently ignore
> -ENOBUFS value and dont free skb
> 
> If you split your work on several patches, you still have to make
> resulting kernels usable.

OK. Will fix.

<...>

> > +	sk->sk_backlog.limit	=	sk->sk_rcvbuf >> 1;

Sorry, typo.

Thanks,
-yi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/8] dccp: use limited socket backlog
  2010-03-03  6:56   ` [PATCH 2/8] dccp: " Eric Dumazet
@ 2010-03-03  7:43     ` Zhu Yi
  0 siblings, 0 replies; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  7:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Arnaldo Carvalho de Melo

On Wed, 2010-03-03 at 14:56 +0800, Eric Dumazet wrote:
> Le mercredi 03 mars 2010 à 14:35 +0800, Zhu Yi a écrit :
> > Make dccp adapt to the limited socket backlog change.
> > 
> > Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
> > Signed-off-by: Zhu Yi <yi.zhu@intel.com>
> > ---
> >  net/dccp/minisocks.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
> > index af226a0..0d508c3 100644
> > --- a/net/dccp/minisocks.c
> > +++ b/net/dccp/minisocks.c
> > @@ -254,7 +254,7 @@ int dccp_child_process(struct sock *parent, struct sock *child,
> >  		 * in main socket hash table and lock on listening
> >  		 * socket does not protect us more.
> >  		 */
> > -		sk_add_backlog(child, skb);
> > +		__sk_add_backlog(child, skb);
> >  	}
> >  
> >  	bh_unlock_sock(child);
> 
> I dont understand this patch.
> 
> You make dccp vulnerable to memory exhaustion, I thought you wanted to
> solve this problem.
> 
> It should therefore be named "dccp: use unlimited socket backlog"
> 
> (And this sounds not so sexy :) )

dccp uses sk_receive_skb() which calls sk_add_backlog() that I fixed in
the first patch. This patch here handles a different case, when the
parent sk is in the LISTEN state and we want the child to process the
packet. Should the backlog limit be applied here? I'm waiting for
comments.

Thanks,
-yi



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] net: add limit for socket backlog
  2010-03-03  7:35   ` Zhu Yi
@ 2010-03-03  8:02     ` Eric Dumazet
  2010-03-03  8:14       ` Zhu Yi
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2010-03-03  8:02 UTC (permalink / raw)
  To: Zhu Yi
  Cc: netdev, David Miller, Arnaldo Carvalho de Melo,
	Pekka Savola (ipv6),
	Patrick McHardy, Vlad Yasevich, Sridhar Samudrala, Per Liden,
	Jon Maloy, Allan Stephens, Andrew Hendry

Le mercredi 03 mars 2010 à 15:35 +0800, Zhu Yi a écrit :
> On Wed, 2010-03-03 at 14:54 +0800, Eric Dumazet wrote:
> > > diff --git a/include/net/sock.h b/include/net/sock.h
> > > index 6cb1676..847119a 100644
> > > --- a/include/net/sock.h
> > > +++ b/include/net/sock.h
> > > @@ -253,6 +253,8 @@ struct sock {
> > >  	struct {
> > >  		struct sk_buff *head;
> > >  		struct sk_buff *tail;
> > > +		int len;
> > > +		int limit;
> > 
> > This new limit field is really not needed
> 
> sk->sk_rcvbuf can be adjusted by setsockopt from user at run time. The
> minimum allowed value is 256. I'm afraid this will break most protocols.

I see, then maybe use some offset in your test, to allow some extra
space.

#define MINBACKLOG 2048

if (sk->sk_backlog.len >= (sk->sk_rcvbuf << 1) + MINBACKLOG)
	return -ENOBUFS;




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] net: add limit for socket backlog
  2010-03-03  8:02     ` Eric Dumazet
@ 2010-03-03  8:14       ` Zhu Yi
  2010-03-03  8:47         ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  8:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David Miller, Arnaldo Carvalho de Melo,
	Pekka Savola (ipv6),
	Patrick McHardy, Vlad Yasevich, Sridhar Samudrala, Per Liden,
	Jon Maloy, Allan Stephens, Andrew Hendry

On Wed, 2010-03-03 at 16:02 +0800, Eric Dumazet wrote:
> > sk->sk_rcvbuf can be adjusted by setsockopt from user at run time.
> The
> > minimum allowed value is 256. I'm afraid this will break most
> protocols.
> 
> I see, then maybe use some offset in your test, to allow some extra
> space.
> 
> #define MINBACKLOG 2048
> 
> if (sk->sk_backlog.len >= (sk->sk_rcvbuf << 1) + MINBACKLOG)
>         return -ENOBUFS;

I want to provide a method for protocols to set its sock backlog limit.
For example, the sctp_rmem is 373500 (sysctl_sctp_rmem[1]) which is
larger than (sysctl_rmem_default << 1). But I'm not very sure about the
actual required size. So I didn't make it in the sctp patch. What do you
think?

Thanks,
-yi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] net: add limit for socket backlog
  2010-03-03  8:14       ` Zhu Yi
@ 2010-03-03  8:47         ` Eric Dumazet
  2010-03-03  8:59           ` Zhu Yi
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2010-03-03  8:47 UTC (permalink / raw)
  To: Zhu Yi
  Cc: netdev, David Miller, Arnaldo Carvalho de Melo,
	Pekka Savola (ipv6),
	Patrick McHardy, Vlad Yasevich, Sridhar Samudrala, Per Liden,
	Jon Maloy, Allan Stephens, Andrew Hendry

Le mercredi 03 mars 2010 à 16:14 +0800, Zhu Yi a écrit :

> I want to provide a method for protocols to set its sock backlog limit.
> For example, the sctp_rmem is 373500 (sysctl_sctp_rmem[1]) which is
> larger than (sysctl_rmem_default << 1). But I'm not very sure about the
> actual required size. So I didn't make it in the sctp patch. What do you
> think?

I think this is not about bug fixes, but followup patches.

We should first address the bug asap, then do further refinements later.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/8] net: add limit for socket backlog
  2010-03-03  8:47         ` Eric Dumazet
@ 2010-03-03  8:59           ` Zhu Yi
  0 siblings, 0 replies; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  8:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David Miller, Arnaldo Carvalho de Melo,
	Pekka Savola (ipv6),
	Patrick McHardy, Vlad Yasevich, Sridhar Samudrala, Per Liden,
	Jon Maloy, Allan Stephens, Andrew Hendry

On Wed, 2010-03-03 at 16:47 +0800, Eric Dumazet wrote:
> Le mercredi 03 mars 2010 à 16:14 +0800, Zhu Yi a écrit :
> 
> > I want to provide a method for protocols to set its sock backlog limit.
> > For example, the sctp_rmem is 373500 (sysctl_sctp_rmem[1]) which is
> > larger than (sysctl_rmem_default << 1). But I'm not very sure about the
> > actual required size. So I didn't make it in the sctp patch. What do you
> > think?
> 
> I think this is not about bug fixes, but followup patches.
> 
> We should first address the bug asap, then do further refinements later.

I'm trying not to break the already working protocols which assume the
backlog is unlimited. In your example, apparently define MINBACKLOG to
2048 is too small. Then what should it be? I'm waiting for responses
from individual protocol maintainers. If all of them feel
(sysctl_rmem_default << 1) is OK, I can remove the .limit and use it
instead.

Thanks,
-yi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] x25: use limited socket backlog
  2010-03-03  7:08               ` Eric Dumazet
@ 2010-03-03 11:38                 ` andrew hendry
  2010-03-03 14:00                   ` Zhu, Yi
  0 siblings, 1 reply; 21+ messages in thread
From: andrew hendry @ 2010-03-03 11:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Zhu Yi, netdev

Will wait for the next spin and in the meantime think if there is way
to test it.
x25 with no loopback and being so slow probably cant generate the same
as your UDP case.

Andrew.

On Wed, Mar 3, 2010 at 6:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 03 mars 2010 à 14:35 +0800, Zhu Yi a écrit :
>> Make x25 adapt to the limited socket backlog change.
>>
>> Cc: Andrew Hendry <andrew.hendry@gmail.com>
>> Signed-off-by: Zhu Yi <yi.zhu@intel.com>
>> ---
>>  net/x25/x25_dev.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/net/x25/x25_dev.c b/net/x25/x25_dev.c
>> index 3e1efe5..5688123 100644
>> --- a/net/x25/x25_dev.c
>> +++ b/net/x25/x25_dev.c
>> @@ -53,7 +53,7 @@ static int x25_receive_data(struct sk_buff *skb, struct x25_neigh *nb)
>>               if (!sock_owned_by_user(sk)) {
>>                       queued = x25_process_rx_frame(sk, skb);
>>               } else {
>> -                     sk_add_backlog(sk, skb);
>> +                     __sk_add_backlog(sk, skb);
>>               }
>>               bh_unlock_sock(sk);
>>               sock_put(sk);
>
> Please respin your patch the other way
>
> Ie: let sk_add_backlog(sk, skb) do its previous job (not leaking skbs,
> and returning a void status)
>
> Add a new function able to no limit backlog, and returns an error code,
> so that caller can free skb and increment SNMP counters accordingly.
>
> Callers MUST test return value, or use another helper that can free the
> skb for them.
>
> Name it sk_move_backlog() for example
>
> This will permit you to split the work as you tried.
>
> sk_add_backlog() could be redefined as the helper :
>
> void sk_add_backlog(sk, skb)
> {
>        if (sk_move_backlog(sk, skb)) {
>                kfree_skb(skb);
>                atomic_inc(&sk->sk_drops);
>        }
>
> }
>
> Thanks
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 8/8] x25: use limited socket backlog
  2010-03-03 11:38                 ` andrew hendry
@ 2010-03-03 14:00                   ` Zhu, Yi
  2010-03-03 14:33                     ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Zhu, Yi @ 2010-03-03 14:00 UTC (permalink / raw)
  To: andrew hendry, Eric Dumazet; +Cc: netdev

andrew hendry <andrew.hendry@gmail.com> wrote:

> Will wait for the next spin and in the meantime think if there is way
> to test it. x25 with no loopback and being so slow probably cant generate the same
> as your UDP case.

I didn't find a way to drop the packet correctly. So I didn't change any behavior in
this patch. Nor did I do in the second spin. It will be fine if you also think x25 doesn't
need to limit its backlog size.

Thanks,
-yi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 8/8] x25: use limited socket backlog
  2010-03-03 14:00                   ` Zhu, Yi
@ 2010-03-03 14:33                     ` Eric Dumazet
  2010-03-03 22:44                       ` andrew hendry
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2010-03-03 14:33 UTC (permalink / raw)
  To: Zhu, Yi; +Cc: andrew hendry, netdev

Le mercredi 03 mars 2010 à 22:00 +0800, Zhu, Yi a écrit :
> andrew hendry <andrew.hendry@gmail.com> wrote:
> 
> > Will wait for the next spin and in the meantime think if there is way
> > to test it. x25 with no loopback and being so slow probably cant generate the same
> > as your UDP case.
> 
> I didn't find a way to drop the packet correctly. So I didn't change any behavior in
> this patch. Nor did I do in the second spin. It will be fine if you also think x25 doesn't
> need to limit its backlog size.

So are we sure we cant flood X25 backlog, using X25 over IP ?

You discovered a _fatal_ flaw in backlog processing, we should close all
holes, not only UDP case. You can be sure many bad guys will inspect all
possibilities to bring down Linux hosts.

If you feel uncomfortable with a small limit, just stick a big one, like
256 packets, and you are 100% sure you wont break a protocol. If this
limit happens to be too small, we can change it later.

(No need to count bytes, since truesize includes kernel overhead, and
this overhead depends on 32/64 wide of host and kernel versions)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] x25: use limited socket backlog
  2010-03-03 14:33                     ` Eric Dumazet
@ 2010-03-03 22:44                       ` andrew hendry
  0 siblings, 0 replies; 21+ messages in thread
From: andrew hendry @ 2010-03-03 22:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Zhu, Yi, netdev

Thinking like someone trying to break it, it may be possible for X25
to flood backlog.
With specific environments, enough circuits XoT/XoE may be able to.
Also tun + TUNSETLINK and some userspace code it might be possible. A
limit should be set just in case, and to cover X25 normal use it
doesn't need to be very big. 256 seems reasonable to start.
I'll look at setting up a fast virtual x.25 environment to test its
backlog behavior.

I think resolve issue for common protocols first, I don't know if
X25=m would be used widely.

Andrew.

On Thu, Mar 4, 2010 at 1:33 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 03 mars 2010 à 22:00 +0800, Zhu, Yi a écrit :
>> andrew hendry <andrew.hendry@gmail.com> wrote:
>>
>> > Will wait for the next spin and in the meantime think if there is way
>> > to test it. x25 with no loopback and being so slow probably cant generate the same
>> > as your UDP case.
>>
>> I didn't find a way to drop the packet correctly. So I didn't change any behavior in
>> this patch. Nor did I do in the second spin. It will be fine if you also think x25 doesn't
>> need to limit its backlog size.
>
> So are we sure we cant flood X25 backlog, using X25 over IP ?
>
> You discovered a _fatal_ flaw in backlog processing, we should close all
> holes, not only UDP case. You can be sure many bad guys will inspect all
> possibilities to bring down Linux hosts.
>
> If you feel uncomfortable with a small limit, just stick a big one, like
> 256 packets, and you are 100% sure you wont break a protocol. If this
> limit happens to be too small, we can change it later.
>
> (No need to count bytes, since truesize includes kernel overhead, and
> this overhead depends on 32/64 wide of host and kernel versions)
>
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-03-03 22:44 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-03  6:35 [PATCH 1/8] net: add limit for socket backlog Zhu Yi
2010-03-03  6:35 ` [PATCH 2/8] dccp: use limited " Zhu Yi
2010-03-03  6:35   ` [PATCH 3/8] tcp: " Zhu Yi
2010-03-03  6:35     ` [PATCH 4/8] udp: " Zhu Yi
2010-03-03  6:35       ` [PATCH 5/8] llc: " Zhu Yi
2010-03-03  6:35         ` [PATCH 6/8] sctp: " Zhu Yi
2010-03-03  6:35           ` [PATCH 7/8] tipc: " Zhu Yi
2010-03-03  6:35             ` [PATCH 8/8] x25: " Zhu Yi
2010-03-03  7:08               ` Eric Dumazet
2010-03-03 11:38                 ` andrew hendry
2010-03-03 14:00                   ` Zhu, Yi
2010-03-03 14:33                     ` Eric Dumazet
2010-03-03 22:44                       ` andrew hendry
2010-03-03  6:56   ` [PATCH 2/8] dccp: " Eric Dumazet
2010-03-03  7:43     ` Zhu Yi
2010-03-03  6:54 ` [PATCH 1/8] net: add limit for " Eric Dumazet
2010-03-03  7:35   ` Zhu Yi
2010-03-03  8:02     ` Eric Dumazet
2010-03-03  8:14       ` Zhu Yi
2010-03-03  8:47         ` Eric Dumazet
2010-03-03  8:59           ` Zhu Yi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.