All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 1/7] net: add limit for socket backlog
@ 2010-03-03  8:36 Zhu Yi
  2010-03-03  8:36 ` [PATCH V2 2/7] tcp: use limited " Zhu Yi
  2010-03-03  8:54 ` [PATCH V2 1/7] net: add limit for " David Miller
  0 siblings, 2 replies; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  8:36 UTC (permalink / raw)
  To: netdev
  Cc: Zhu Yi, David Miller, Arnaldo Carvalho de Melo,
	Pekka Savola (ipv6),
	Patrick McHardy, Vlad Yasevich, Sridhar Samudrala, Jon Maloy,
	Allan Stephens, Andrew Hendry, Eric Dumazet

We got system OOM while running some UDP netperf testing on the loopback
device. The case is multiple senders sent stream UDP packets to a single
receiver via loopback on local host. Of course, the receiver is not able
to handle all the packets in time. But we surprisingly found that these
packets were not discarded due to the receiver's sk->sk_rcvbuf limit.
Instead, they are kept queuing to sk->sk_backlog and finally ate up all
the memory. We believe this is a secure hole that a none privileged user
can crash the system.

The root cause for this problem is, when the receiver is doing
__release_sock() (i.e. after userspace recv, kernel udp_recvmsg ->
skb_free_datagram_locked -> release_sock), it moves skbs from backlog to
sk_receive_queue with the softirq enabled. In the above case, multiple
busy senders will almost make it an endless loop. The skbs in the
backlog end up eat all the system memory.

The issue is not only for UDP. Any protocols using socket backlog is
potentially affected. The patch adds limit for socket backlog so that
the backlog size cannot be expanded endlessly.

Reported-by: Alex Shi <alex.shi@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h |   15 ++++++++++++++-
 net/core/sock.c    |   16 ++++++++++++++--
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 6cb1676..2516d76 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -253,6 +253,8 @@ struct sock {
 	struct {
 		struct sk_buff *head;
 		struct sk_buff *tail;
+		int len;
+		int limit;
 	} sk_backlog;
 	wait_queue_head_t	*sk_sleep;
 	struct dst_entry	*sk_dst_cache;
@@ -589,7 +591,7 @@ static inline int sk_stream_memory_free(struct sock *sk)
 	return sk->sk_wmem_queued < sk->sk_sndbuf;
 }
 
-/* The per-socket spinlock must be held here. */
+/* OOB backlog add */
 static inline void sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
 	if (!sk->sk_backlog.tail) {
@@ -601,6 +603,17 @@ static inline void sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 	skb->next = NULL;
 }
 
+/* The per-socket spinlock must be held here. */
+static inline int sk_add_backlog_limited(struct sock *sk, struct sk_buff *skb)
+{
+	if (sk->sk_backlog.len >= max(sk->sk_backlog.limit, sk->sk_rcvbuf << 1))
+		return -ENOBUFS;
+
+	sk_add_backlog(sk, skb);
+	sk->sk_backlog.len += skb->truesize;
+	return 0;
+}
+
 static inline int sk_backlog_rcv(struct sock *sk, struct sk_buff *skb)
 {
 	return sk->sk_backlog_rcv(sk, skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index fcd397a..6e22dc9 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -340,8 +340,12 @@ int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested)
 		rc = sk_backlog_rcv(sk, skb);
 
 		mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
-	} else
-		sk_add_backlog(sk, skb);
+	} else if (sk_add_backlog_limited(sk, skb)) {
+		bh_unlock_sock(sk);
+		atomic_inc(&sk->sk_drops);
+		goto discard_and_relse;
+	}
+
 	bh_unlock_sock(sk);
 out:
 	sock_put(sk);
@@ -1139,6 +1143,7 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
 		sock_lock_init(newsk);
 		bh_lock_sock(newsk);
 		newsk->sk_backlog.head	= newsk->sk_backlog.tail = NULL;
+		newsk->sk_backlog.len = 0;
 
 		atomic_set(&newsk->sk_rmem_alloc, 0);
 		/*
@@ -1542,6 +1547,12 @@ static void __release_sock(struct sock *sk)
 
 		bh_lock_sock(sk);
 	} while ((skb = sk->sk_backlog.head) != NULL);
+
+	/*
+	 * Doing the zeroing here guarantee we can not loop forever
+	 * while a wild producer attempts to flood us.
+	 */
+	sk->sk_backlog.len = 0;
 }
 
 /**
@@ -1874,6 +1885,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 	sk->sk_allocation	=	GFP_KERNEL;
 	sk->sk_rcvbuf		=	sysctl_rmem_default;
 	sk->sk_sndbuf		=	sysctl_wmem_default;
+	sk->sk_backlog.limit	=	sk->sk_rcvbuf << 1;
 	sk->sk_state		=	TCP_CLOSE;
 	sk_set_socket(sk, sock);
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V2 2/7] tcp: use limited socket backlog
  2010-03-03  8:36 [PATCH V2 1/7] net: add limit for socket backlog Zhu Yi
@ 2010-03-03  8:36 ` Zhu Yi
  2010-03-03  8:36   ` [PATCH V2 3/7] udp: " Zhu Yi
  2010-03-03  8:53   ` [PATCH V2 2/7] tcp: " Eric Dumazet
  2010-03-03  8:54 ` [PATCH V2 1/7] net: add limit for " David Miller
  1 sibling, 2 replies; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  8:36 UTC (permalink / raw)
  To: netdev
  Cc: Zhu Yi, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	Patrick McHardy

Make tcp adapt to the limited socket backlog change.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 net/ipv4/tcp_ipv4.c |    6 ++++--
 net/ipv6/tcp_ipv6.c |    6 ++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index c3588b4..4baf194 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1682,8 +1682,10 @@ process:
 			if (!tcp_prequeue(sk, skb))
 				ret = tcp_v4_do_rcv(sk, skb);
 		}
-	} else
-		sk_add_backlog(sk, skb);
+	} else if (sk_add_backlog_limited(sk, skb)) {
+		bh_unlock_sock(sk);
+		goto discard_and_relse;
+	}
 	bh_unlock_sock(sk);
 
 	sock_put(sk);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 6963a6b..c4ea9d5 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1740,8 +1740,10 @@ process:
 			if (!tcp_prequeue(sk, skb))
 				ret = tcp_v6_do_rcv(sk, skb);
 		}
-	} else
-		sk_add_backlog(sk, skb);
+	} else if (sk_add_backlog_limited(sk, skb)) {
+		bh_unlock_sock(sk);
+		goto discard_and_relse;
+	}
 	bh_unlock_sock(sk);
 
 	sock_put(sk);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V2 3/7] udp: use limited socket backlog
  2010-03-03  8:36 ` [PATCH V2 2/7] tcp: use limited " Zhu Yi
@ 2010-03-03  8:36   ` Zhu Yi
       [not found]     ` <1267605389-7369-4-git-send-email-yi.zhu@intel.com>
  2010-03-03  8:53   ` [PATCH V2 2/7] tcp: " Eric Dumazet
  1 sibling, 1 reply; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  8:36 UTC (permalink / raw)
  To: netdev
  Cc: Zhu Yi, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	Patrick McHardy

Make udp adapt to the limited socket backlog change.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 net/ipv4/udp.c |    6 ++++--
 net/ipv6/udp.c |   28 ++++++++++++++++++----------
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 608a544..e7eb47f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1371,8 +1371,10 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	bh_lock_sock(sk);
 	if (!sock_owned_by_user(sk))
 		rc = __udp_queue_rcv_skb(sk, skb);
-	else
-		sk_add_backlog(sk, skb);
+	else if (sk_add_backlog_limited(sk, skb)) {
+		bh_unlock_sock(sk);
+		goto drop;
+	}
 	bh_unlock_sock(sk);
 
 	return rc;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 52b8347..6480491 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -583,16 +583,20 @@ static void flush_stack(struct sock **stack, unsigned int count,
 			bh_lock_sock(sk);
 			if (!sock_owned_by_user(sk))
 				udpv6_queue_rcv_skb(sk, skb1);
-			else
-				sk_add_backlog(sk, skb1);
+			else if (sk_add_backlog_limited(sk, skb1)) {
+				kfree_skb(skb1);
+				bh_unlock_sock(sk);
+				goto drop;
+			}
 			bh_unlock_sock(sk);
-		} else {
-			atomic_inc(&sk->sk_drops);
-			UDP6_INC_STATS_BH(sock_net(sk),
-					UDP_MIB_RCVBUFERRORS, IS_UDPLITE(sk));
-			UDP6_INC_STATS_BH(sock_net(sk),
-					UDP_MIB_INERRORS, IS_UDPLITE(sk));
+			continue;
 		}
+drop:
+		atomic_inc(&sk->sk_drops);
+		UDP6_INC_STATS_BH(sock_net(sk),
+				UDP_MIB_RCVBUFERRORS, IS_UDPLITE(sk));
+		UDP6_INC_STATS_BH(sock_net(sk),
+				UDP_MIB_INERRORS, IS_UDPLITE(sk));
 	}
 }
 /*
@@ -754,8 +758,12 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	bh_lock_sock(sk);
 	if (!sock_owned_by_user(sk))
 		udpv6_queue_rcv_skb(sk, skb);
-	else
-		sk_add_backlog(sk, skb);
+	else if (sk_add_backlog_limited(sk, skb)) {
+		atomic_inc(&sk->sk_drops);
+		bh_unlock_sock(sk);
+		sock_put(sk);
+		goto discard;
+	}
 	bh_unlock_sock(sk);
 	sock_put(sk);
 	return 0;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V2 5/7] sctp: use limited socket backlog
       [not found]     ` <1267605389-7369-4-git-send-email-yi.zhu@intel.com>
@ 2010-03-03  8:36       ` Zhu Yi
  2010-03-03  8:36         ` [PATCH V2 6/7] tipc: " Zhu Yi
  2010-03-03 14:10         ` [PATCH V2 5/7] sctp: use limited socket backlog Vlad Yasevich
  0 siblings, 2 replies; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  8:36 UTC (permalink / raw)
  To: netdev; +Cc: Zhu Yi, Vlad Yasevich, Sridhar Samudrala

Make sctp adapt to the limited socket backlog change.

Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 net/sctp/input.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/sctp/input.c b/net/sctp/input.c
index c0c973e..20e69c3 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -75,7 +75,7 @@ static struct sctp_association *__sctp_lookup_association(
 					const union sctp_addr *peer,
 					struct sctp_transport **pt);
 
-static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb);
+static int sctp_add_backlog(struct sock *sk, struct sk_buff *skb);
 
 
 /* Calculate the SCTP checksum of an SCTP packet.  */
@@ -265,8 +265,12 @@ int sctp_rcv(struct sk_buff *skb)
 	}
 
 	if (sock_owned_by_user(sk)) {
+		if (sctp_add_backlog(sk, skb)) {
+			sctp_bh_unlock_sock(sk);
+			sctp_chunk_free(chunk);
+			goto discard_release;
+		}
 		SCTP_INC_STATS_BH(SCTP_MIB_IN_PKT_BACKLOG);
-		sctp_add_backlog(sk, skb);
 	} else {
 		SCTP_INC_STATS_BH(SCTP_MIB_IN_PKT_SOFTIRQ);
 		sctp_inq_push(&chunk->rcvr->inqueue, chunk);
@@ -362,7 +366,7 @@ done:
 	return 0;
 }
 
-static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
+static int sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
 	struct sctp_chunk *chunk = SCTP_INPUT_CB(skb)->chunk;
 	struct sctp_ep_common *rcvr = chunk->rcvr;
@@ -377,7 +381,7 @@ static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
 	else
 		BUG();
 
-	sk_add_backlog(sk, skb);
+	return sk_add_backlog_limited(sk, skb);
 }
 
 /* Handle icmp frag needed error. */
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V2 6/7] tipc: use limited socket backlog
  2010-03-03  8:36       ` [PATCH V2 5/7] sctp: " Zhu Yi
@ 2010-03-03  8:36         ` Zhu Yi
  2010-03-03  8:36           ` [PATCH V2 7/7] net: backlog functions rename Zhu Yi
  2010-03-03 14:10         ` [PATCH V2 5/7] sctp: use limited socket backlog Vlad Yasevich
  1 sibling, 1 reply; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  8:36 UTC (permalink / raw)
  To: netdev; +Cc: Zhu Yi, Jon Maloy, Allan Stephens

Make tipc adapt to the limited socket backlog change.

Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 net/tipc/socket.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 1ea64f0..22bfbc3 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1322,8 +1322,10 @@ static u32 dispatch(struct tipc_port *tport, struct sk_buff *buf)
 	if (!sock_owned_by_user(sk)) {
 		res = filter_rcv(sk, buf);
 	} else {
-		sk_add_backlog(sk, buf);
-		res = TIPC_OK;
+		if (sk_add_backlog_limited(sk, buf))
+			res = TIPC_ERR_OVERLOAD;
+		else
+			res = TIPC_OK;
 	}
 	bh_unlock_sock(sk);
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V2 7/7] net: backlog functions rename
  2010-03-03  8:36         ` [PATCH V2 6/7] tipc: " Zhu Yi
@ 2010-03-03  8:36           ` Zhu Yi
  0 siblings, 0 replies; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  8:36 UTC (permalink / raw)
  To: netdev; +Cc: Zhu Yi

sk_add_backlog -> __sk_add_backlog
sk_add_backlog_limited -> sk_add_backlog

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 include/net/sock.h       |    6 +++---
 net/core/sock.c          |    2 +-
 net/dccp/minisocks.c     |    2 +-
 net/ipv4/tcp_ipv4.c      |    2 +-
 net/ipv4/tcp_minisocks.c |    2 +-
 net/ipv4/udp.c           |    2 +-
 net/ipv6/tcp_ipv6.c      |    2 +-
 net/ipv6/udp.c           |    4 ++--
 net/llc/llc_c_ac.c       |    2 +-
 net/llc/llc_conn.c       |    2 +-
 net/sctp/input.c         |    4 ++--
 net/tipc/socket.c        |    2 +-
 net/x25/x25_dev.c        |    2 +-
 13 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 2516d76..170353d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -592,7 +592,7 @@ static inline int sk_stream_memory_free(struct sock *sk)
 }
 
 /* OOB backlog add */
-static inline void sk_add_backlog(struct sock *sk, struct sk_buff *skb)
+static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
 	if (!sk->sk_backlog.tail) {
 		sk->sk_backlog.head = sk->sk_backlog.tail = skb;
@@ -604,12 +604,12 @@ static inline void sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 }
 
 /* The per-socket spinlock must be held here. */
-static inline int sk_add_backlog_limited(struct sock *sk, struct sk_buff *skb)
+static inline int sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
 	if (sk->sk_backlog.len >= max(sk->sk_backlog.limit, sk->sk_rcvbuf << 1))
 		return -ENOBUFS;
 
-	sk_add_backlog(sk, skb);
+	__sk_add_backlog(sk, skb);
 	sk->sk_backlog.len += skb->truesize;
 	return 0;
 }
diff --git a/net/core/sock.c b/net/core/sock.c
index 6e22dc9..61a65a2 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -340,7 +340,7 @@ int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested)
 		rc = sk_backlog_rcv(sk, skb);
 
 		mutex_release(&sk->sk_lock.dep_map, 1, _RET_IP_);
-	} else if (sk_add_backlog_limited(sk, skb)) {
+	} else if (sk_add_backlog(sk, skb)) {
 		bh_unlock_sock(sk);
 		atomic_inc(&sk->sk_drops);
 		goto discard_and_relse;
diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c
index af226a0..0d508c3 100644
--- a/net/dccp/minisocks.c
+++ b/net/dccp/minisocks.c
@@ -254,7 +254,7 @@ int dccp_child_process(struct sock *parent, struct sock *child,
 		 * in main socket hash table and lock on listening
 		 * socket does not protect us more.
 		 */
-		sk_add_backlog(child, skb);
+		__sk_add_backlog(child, skb);
 	}
 
 	bh_unlock_sock(child);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 4baf194..1915f7d 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1682,7 +1682,7 @@ process:
 			if (!tcp_prequeue(sk, skb))
 				ret = tcp_v4_do_rcv(sk, skb);
 		}
-	} else if (sk_add_backlog_limited(sk, skb)) {
+	} else if (sk_add_backlog(sk, skb)) {
 		bh_unlock_sock(sk);
 		goto discard_and_relse;
 	}
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index f206ee5..4199bc6 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -728,7 +728,7 @@ int tcp_child_process(struct sock *parent, struct sock *child,
 		 * in main socket hash table and lock on listening
 		 * socket does not protect us more.
 		 */
-		sk_add_backlog(child, skb);
+		__sk_add_backlog(child, skb);
 	}
 
 	bh_unlock_sock(child);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e7eb47f..7af756d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1371,7 +1371,7 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	bh_lock_sock(sk);
 	if (!sock_owned_by_user(sk))
 		rc = __udp_queue_rcv_skb(sk, skb);
-	else if (sk_add_backlog_limited(sk, skb)) {
+	else if (sk_add_backlog(sk, skb)) {
 		bh_unlock_sock(sk);
 		goto drop;
 	}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c4ea9d5..2c378b1 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1740,7 +1740,7 @@ process:
 			if (!tcp_prequeue(sk, skb))
 				ret = tcp_v6_do_rcv(sk, skb);
 		}
-	} else if (sk_add_backlog_limited(sk, skb)) {
+	} else if (sk_add_backlog(sk, skb)) {
 		bh_unlock_sock(sk);
 		goto discard_and_relse;
 	}
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 6480491..3c0c9c7 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -583,7 +583,7 @@ static void flush_stack(struct sock **stack, unsigned int count,
 			bh_lock_sock(sk);
 			if (!sock_owned_by_user(sk))
 				udpv6_queue_rcv_skb(sk, skb1);
-			else if (sk_add_backlog_limited(sk, skb1)) {
+			else if (sk_add_backlog(sk, skb1)) {
 				kfree_skb(skb1);
 				bh_unlock_sock(sk);
 				goto drop;
@@ -758,7 +758,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	bh_lock_sock(sk);
 	if (!sock_owned_by_user(sk))
 		udpv6_queue_rcv_skb(sk, skb);
-	else if (sk_add_backlog_limited(sk, skb)) {
+	else if (sk_add_backlog(sk, skb)) {
 		atomic_inc(&sk->sk_drops);
 		bh_unlock_sock(sk);
 		sock_put(sk);
diff --git a/net/llc/llc_c_ac.c b/net/llc/llc_c_ac.c
index 019c780..86d6985 100644
--- a/net/llc/llc_c_ac.c
+++ b/net/llc/llc_c_ac.c
@@ -1437,7 +1437,7 @@ static void llc_process_tmr_ev(struct sock *sk, struct sk_buff *skb)
 			llc_conn_state_process(sk, skb);
 		else {
 			llc_set_backlog_type(skb, LLC_EVENT);
-			sk_add_backlog(sk, skb);
+			__sk_add_backlog(sk, skb);
 		}
 	}
 }
diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
index c0539ff..a12144d 100644
--- a/net/llc/llc_conn.c
+++ b/net/llc/llc_conn.c
@@ -827,7 +827,7 @@ void llc_conn_handler(struct llc_sap *sap, struct sk_buff *skb)
 	else {
 		dprintk("%s: adding to backlog...\n", __func__);
 		llc_set_backlog_type(skb, LLC_PACKET);
-		if (sk_add_backlog_limited(sk, skb))
+		if (sk_add_backlog(sk, skb))
 			goto drop_unlock;
 	}
 out:
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 20e69c3..ca42181 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -340,7 +340,7 @@ int sctp_backlog_rcv(struct sock *sk, struct sk_buff *skb)
 		sctp_bh_lock_sock(sk);
 
 		if (sock_owned_by_user(sk)) {
-			sk_add_backlog(sk, skb);
+			__sk_add_backlog(sk, skb);
 			backloged = 1;
 		} else
 			sctp_inq_push(inqueue, chunk);
@@ -381,7 +381,7 @@ static int sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
 	else
 		BUG();
 
-	return sk_add_backlog_limited(sk, skb);
+	return sk_add_backlog(sk, skb);
 }
 
 /* Handle icmp frag needed error. */
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 22bfbc3..4b235fc 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1322,7 +1322,7 @@ static u32 dispatch(struct tipc_port *tport, struct sk_buff *buf)
 	if (!sock_owned_by_user(sk)) {
 		res = filter_rcv(sk, buf);
 	} else {
-		if (sk_add_backlog_limited(sk, buf))
+		if (sk_add_backlog(sk, buf))
 			res = TIPC_ERR_OVERLOAD;
 		else
 			res = TIPC_OK;
diff --git a/net/x25/x25_dev.c b/net/x25/x25_dev.c
index 3e1efe5..5688123 100644
--- a/net/x25/x25_dev.c
+++ b/net/x25/x25_dev.c
@@ -53,7 +53,7 @@ static int x25_receive_data(struct sk_buff *skb, struct x25_neigh *nb)
 		if (!sock_owned_by_user(sk)) {
 			queued = x25_process_rx_frame(sk, skb);
 		} else {
-			sk_add_backlog(sk, skb);
+			__sk_add_backlog(sk, skb);
 		}
 		bh_unlock_sock(sk);
 		sock_put(sk);
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 2/7] tcp: use limited socket backlog
  2010-03-03  8:36 ` [PATCH V2 2/7] tcp: use limited " Zhu Yi
  2010-03-03  8:36   ` [PATCH V2 3/7] udp: " Zhu Yi
@ 2010-03-03  8:53   ` Eric Dumazet
  2010-03-03  9:06     ` Zhu Yi
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2010-03-03  8:53 UTC (permalink / raw)
  To: Zhu Yi
  Cc: netdev, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	Patrick McHardy

Le mercredi 03 mars 2010 à 16:36 +0800, Zhu Yi a écrit :
> Make tcp adapt to the limited socket backlog change.
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
> Cc: Patrick McHardy <kaber@trash.net>
> Signed-off-by: Zhu Yi <yi.zhu@intel.com>
> ---
>  net/ipv4/tcp_ipv4.c |    6 ++++--
>  net/ipv6/tcp_ipv6.c |    6 ++++--
>  2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index c3588b4..4baf194 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1682,8 +1682,10 @@ process:
>  			if (!tcp_prequeue(sk, skb))
>  				ret = tcp_v4_do_rcv(sk, skb);
>  		}
> -	} else
> -		sk_add_backlog(sk, skb);
> +	} else if (sk_add_backlog_limited(sk, skb)) {
> +		bh_unlock_sock(sk);
> +		goto discard_and_relse;
> +	}
>  	bh_unlock_sock(sk);
>  
>  	sock_put(sk);

So no counter is incremented to reflect this loss, sk->sk_drops (local
counter) or SNMP ?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 1/7] net: add limit for socket backlog
  2010-03-03  8:36 [PATCH V2 1/7] net: add limit for socket backlog Zhu Yi
  2010-03-03  8:36 ` [PATCH V2 2/7] tcp: use limited " Zhu Yi
@ 2010-03-03  8:54 ` David Miller
  2010-03-03  9:01   ` Zhu Yi
  1 sibling, 1 reply; 21+ messages in thread
From: David Miller @ 2010-03-03  8:54 UTC (permalink / raw)
  To: yi.zhu
  Cc: netdev, acme, pekkas, kaber, vladislav.yasevich, sri, jon.maloy,
	allan.stephens, andrew.hendry, eric.dumazet


Please don't selectively CC: me only on some of the patches
in a series that's meant to go into the networking tree.

I handle patch submissions by deleting the copies from my inbox that
come to me through the mailing list, so if you don't CC: me on all
of them I'll lose some of them.

Thanks.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 1/7] net: add limit for socket backlog
  2010-03-03  8:54 ` [PATCH V2 1/7] net: add limit for " David Miller
@ 2010-03-03  9:01   ` Zhu Yi
  0 siblings, 0 replies; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  9:01 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, acme, pekkas, kaber, vladislav.yasevich, sri, jon.maloy,
	allan.stephens, andrew.hendry, eric.dumazet

On Wed, 2010-03-03 at 16:54 +0800, David Miller wrote:
> 
> Please don't selectively CC: me only on some of the patches
> in a series that's meant to go into the networking tree.
> 
> I handle patch submissions by deleting the copies from my inbox that
> come to me through the mailing list, so if you don't CC: me on all
> of them I'll lose some of them. 

OK. The series is not final. I'll do so next time.

Thanks,
-yi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 2/7] tcp: use limited socket backlog
  2010-03-03  8:53   ` [PATCH V2 2/7] tcp: " Eric Dumazet
@ 2010-03-03  9:06     ` Zhu Yi
  2010-03-03 10:07       ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Zhu Yi @ 2010-03-03  9:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	Patrick McHardy

On Wed, 2010-03-03 at 16:53 +0800, Eric Dumazet wrote:
> > @@ -1682,8 +1682,10 @@ process:
> >                       if (!tcp_prequeue(sk, skb))
> >                               ret = tcp_v4_do_rcv(sk, skb);
> >               }
> > -     } else
> > -             sk_add_backlog(sk, skb);
> > +     } else if (sk_add_backlog_limited(sk, skb)) {
> > +             bh_unlock_sock(sk);
> > +             goto discard_and_relse;
> > +     }
> >       bh_unlock_sock(sk);
> >  
> >       sock_put(sk);
> 
> So no counter is incremented to reflect this loss, sk->sk_drops (local
> counter) or SNMP ? 

I simply follow how the code is originally written. As you can see,
tcp_v4_do_rcv() doesn't always do so. And in the backlog queuing place,
we don't even bother to check.

Thanks,
-yi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 2/7] tcp: use limited socket backlog
  2010-03-03  9:06     ` Zhu Yi
@ 2010-03-03 10:07       ` Eric Dumazet
  2010-03-03 14:12         ` Zhu, Yi
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2010-03-03 10:07 UTC (permalink / raw)
  To: Zhu Yi
  Cc: netdev, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	Patrick McHardy

Le mercredi 03 mars 2010 à 17:06 +0800, Zhu Yi a écrit :

> I simply follow how the code is originally written. As you can see,
> tcp_v4_do_rcv() doesn't always do so. And in the backlog queuing place,
> we don't even bother to check.

You add a new point where a packet can be dropped, this should be
accounted for, so that admins can have a clue whats going on.

Previously, packet was always queued, and dropped later (and accounted)

Not everybody runs drop monitor :)



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 5/7] sctp: use limited socket backlog
  2010-03-03  8:36       ` [PATCH V2 5/7] sctp: " Zhu Yi
  2010-03-03  8:36         ` [PATCH V2 6/7] tipc: " Zhu Yi
@ 2010-03-03 14:10         ` Vlad Yasevich
  2010-03-03 14:19           ` Zhu, Yi
  1 sibling, 1 reply; 21+ messages in thread
From: Vlad Yasevich @ 2010-03-03 14:10 UTC (permalink / raw)
  To: Zhu Yi; +Cc: netdev, Sridhar Samudrala



Zhu Yi wrote:
> Make sctp adapt to the limited socket backlog change.
> 
> Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
> Cc: Sridhar Samudrala <sri@us.ibm.com>
> Signed-off-by: Zhu Yi <yi.zhu@intel.com>
> ---
>  net/sctp/input.c |   12 ++++++++----
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sctp/input.c b/net/sctp/input.c
> index c0c973e..20e69c3 100644
> --- a/net/sctp/input.c
> +++ b/net/sctp/input.c
> @@ -75,7 +75,7 @@ static struct sctp_association *__sctp_lookup_association(
>  					const union sctp_addr *peer,
>  					struct sctp_transport **pt);
>  
> -static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb);
> +static int sctp_add_backlog(struct sock *sk, struct sk_buff *skb);
>  
>  
>  /* Calculate the SCTP checksum of an SCTP packet.  */
> @@ -265,8 +265,12 @@ int sctp_rcv(struct sk_buff *skb)
>  	}
>  
>  	if (sock_owned_by_user(sk)) {
> +		if (sctp_add_backlog(sk, skb)) {
> +			sctp_bh_unlock_sock(sk);
> +			sctp_chunk_free(chunk);
> +			goto discard_release;
> +		}

I think this will result in a double-free of the skb, because sctp_chunk_free
attempts to free the skb that's been assigned to the chunk.  You can set the skb
to NULL to get around that.

>  		SCTP_INC_STATS_BH(SCTP_MIB_IN_PKT_BACKLOG);
> -		sctp_add_backlog(sk, skb);
>  	} else {
>  		SCTP_INC_STATS_BH(SCTP_MIB_IN_PKT_SOFTIRQ);
>  		sctp_inq_push(&chunk->rcvr->inqueue, chunk);
> @@ -362,7 +366,7 @@ done:
>  	return 0;
>  }
>  
> -static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
> +static int sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
>  {
>  	struct sctp_chunk *chunk = SCTP_INPUT_CB(skb)->chunk;
>  	struct sctp_ep_common *rcvr = chunk->rcvr;
> @@ -377,7 +381,7 @@ static void sctp_add_backlog(struct sock *sk, struct sk_buff *skb)
>  	else
>  		BUG();
>  
> -	sk_add_backlog(sk, skb);
> +	return sk_add_backlog_limited(sk, skb);
>  }

You also leak the ref counts here since now it's possible to not add a packet to
 the backlog queue.  That means you'll take refs, but never drop them because
the receive routing will never run.

-vlad
>  
>  /* Handle icmp frag needed error. */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 2/7] tcp: use limited socket backlog
  2010-03-03 10:07       ` Eric Dumazet
@ 2010-03-03 14:12         ` Zhu, Yi
  2010-03-03 14:31           ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Zhu, Yi @ 2010-03-03 14:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	Patrick McHardy

Eric Dumazet [mailto:eric.dumazet@gmail.com] wrote:
> Le mercredi 03 mars 2010 à 17:06 +0800, Zhu Yi a écrit :

>> I simply follow how the code is originally written. As you can see,
>> tcp_v4_do_rcv() doesn't always do so. And in the backlog queuing place,
>> we don't even bother to check.

> You add a new point where a packet can be dropped, this should be
> accounted for, so that admins can have a clue whats going on.

> Previously, packet was always queued, and dropped later (and accounted)

In case of the skb doesn't have a MD5 option while we are expecting one, or we
failed to find the sk for the skb connection request, etc, the skb is dropped silently in
tcp_v4_do_rcv(). No?

> Not everybody runs drop monitor :)

Thanks,
-yi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 5/7] sctp: use limited socket backlog
  2010-03-03 14:10         ` [PATCH V2 5/7] sctp: use limited socket backlog Vlad Yasevich
@ 2010-03-03 14:19           ` Zhu, Yi
  2010-03-03 14:36             ` Vlad Yasevich
  0 siblings, 1 reply; 21+ messages in thread
From: Zhu, Yi @ 2010-03-03 14:19 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, Sridhar Samudrala

Vlad Yasevich <vladislav.yasevich@hp.com> wrote:

> I think this will result in a double-free of the skb, because sctp_chunk_free
> attempts to free the skb that's been assigned to the chunk.  You can set the skb
> to NULL to get around that.

Ah, I missed that. Thanks!

<...>

> You also leak the ref counts here since now it's possible to not add a packet to
> the backlog queue.  That means you'll take refs, but never drop them because
> the receive routing will never run.

Good catch. I'll fix it.

BTW, does the current backlog limit (sysctl_rmem_default[1] << 1) enough for sctp?
I noticed the sysctl_sctp_rmem[1] is set to 373500 in my box.

Thanks,
-yi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 2/7] tcp: use limited socket backlog
  2010-03-03 14:12         ` Zhu, Yi
@ 2010-03-03 14:31           ` Eric Dumazet
  2010-03-04  5:21             ` Zhu, Yi
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2010-03-03 14:31 UTC (permalink / raw)
  To: Zhu, Yi
  Cc: netdev, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	Patrick McHardy

Le mercredi 03 mars 2010 à 22:12 +0800, Zhu, Yi a écrit :
> Eric Dumazet [mailto:eric.dumazet@gmail.com] wrote:
> > Le mercredi 03 mars 2010 à 17:06 +0800, Zhu Yi a écrit :
> 
> >> I simply follow how the code is originally written. As you can see,
> >> tcp_v4_do_rcv() doesn't always do so. And in the backlog queuing place,
> >> we don't even bother to check.
> 
> > You add a new point where a packet can be dropped, this should be
> > accounted for, so that admins can have a clue whats going on.
> 
> > Previously, packet was always queued, and dropped later (and accounted)
> 
> In case of the skb doesn't have a MD5 option while we are expecting one, or we
> failed to find the sk for the skb connection request, etc, the skb is dropped silently in
> tcp_v4_do_rcv(). No?

Then its a separate bug. MD5 support added so many bugs its not even
funny.

Existing bugs are not an excuse for adding new ones, we try the reverse.
No ?




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 5/7] sctp: use limited socket backlog
  2010-03-03 14:19           ` Zhu, Yi
@ 2010-03-03 14:36             ` Vlad Yasevich
  2010-03-04  2:00               ` Zhu, Yi
  0 siblings, 1 reply; 21+ messages in thread
From: Vlad Yasevich @ 2010-03-03 14:36 UTC (permalink / raw)
  To: Zhu, Yi; +Cc: netdev, Sridhar Samudrala



Zhu, Yi wrote:
> Vlad Yasevich <vladislav.yasevich@hp.com> wrote:
> 
>> I think this will result in a double-free of the skb, because sctp_chunk_free
>> attempts to free the skb that's been assigned to the chunk.  You can set the skb
>> to NULL to get around that.
> 
> Ah, I missed that. Thanks!
> 
> <...>
> 
>> You also leak the ref counts here since now it's possible to not add a packet to
>> the backlog queue.  That means you'll take refs, but never drop them because
>> the receive routing will never run.
> 
> Good catch. I'll fix it.
> 
> BTW, does the current backlog limit (sysctl_rmem_default[1] << 1) enough for sctp?
> I noticed the sysctl_sctp_rmem[1] is set to 373500 in my box.
> 

sctp uses the same algorithm as TCP to figure out the memory values.
I guess the issue with using the smaller value that it would be possible to
queue more the socket receive buffer then to the backlog.  Thus backlog would
start dropping packets even though receive buffer would still accept them.

-vlad

> Thanks,
> -yi
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 5/7] sctp: use limited socket backlog
  2010-03-03 14:36             ` Vlad Yasevich
@ 2010-03-04  2:00               ` Zhu, Yi
  0 siblings, 0 replies; 21+ messages in thread
From: Zhu, Yi @ 2010-03-04  2:00 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, Sridhar Samudrala

Vlad Yasevich <vladislav.yasevich@hp.com> wrote:

>> BTW, does the current backlog limit (sysctl_rmem_default[1] << 1) enough for sctp?
>> I noticed the sysctl_sctp_rmem[1] is set to 373500 in my box.
>> 

> sctp uses the same algorithm as TCP to figure out the memory values.
> I guess the issue with using the smaller value that it would be possible to
> queue more the socket receive buffer then to the backlog.  Thus backlog would
> start dropping packets even though receive buffer would still accept them.

Sysctl_tcp_rmem[1] = 87380 in my box which is much smaller than sctp. The backlog
limit is set to 258048 by default which is smaller than sysctl_sctp_rmem[1] = 373500.
I don't think backlog starts to drop packets before receive queue is correct. Fortunately
we can set the limit by individual protocols. I'll set the sk->sk_backlog.limit to
sysctl_sctp_rmem[1] in sctp_init_sock() for sctp socks in the next version.

Thanks,
-yi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 2/7] tcp: use limited socket backlog
  2010-03-03 14:31           ` Eric Dumazet
@ 2010-03-04  5:21             ` Zhu, Yi
  2010-03-04  6:00               ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Zhu, Yi @ 2010-03-04  5:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	Patrick McHardy

Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Then its a separate bug. MD5 support added so many bugs its not even
> funny.

> Existing bugs are not an excuse for adding new ones, we try the reverse.
> No ?

Can you show me where sk_drops is used by TCP and what SNMP MIB value
should I use for backlog dropping? TCP_MIB_INERRS doesn't seem correct.

Thanks,
-yi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 2/7] tcp: use limited socket backlog
  2010-03-04  5:21             ` Zhu, Yi
@ 2010-03-04  6:00               ` Eric Dumazet
  2010-03-04 11:04                 ` Zhu, Yi
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Dumazet @ 2010-03-04  6:00 UTC (permalink / raw)
  To: Zhu, Yi
  Cc: netdev, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	Patrick McHardy

Le jeudi 04 mars 2010 à 13:21 +0800, Zhu, Yi a écrit :
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Then its a separate bug. MD5 support added so many bugs its not even
> > funny.
> 
> > Existing bugs are not an excuse for adding new ones, we try the reverse.
> > No ?
> 
> Can you show me where sk_drops is used by TCP and what SNMP MIB value
> should I use for backlog dropping? TCP_MIB_INERRS doesn't seem correct.

sk_drop is not yet used by TCP, because when backlog processing is
performed, TCP state machine has much finer grain capabilities to show
why a packet is dropped. In our backlog drop, we dont examine details of
packet and drop it at a lower level.

Please add a new counter, say LINUX_MIB_TCPBACKLOGDROP :

3 added lines:
- one in "include/linux/snmp.h" to define the MIB name
- one to define its string
- one to perform the increment when actual drop occurs)

A good starting point would be to study recent commit
72032fdbcde8b333e65b3430e1bcb4358e2d6716 from Jamal
(xfrm: Introduce LINUX_MIB_XFRMFWDHDRERROR)

For sk_drop, just increment it and we can get it at generic sock level.

Thanks !



^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 2/7] tcp: use limited socket backlog
  2010-03-04  6:00               ` Eric Dumazet
@ 2010-03-04 11:04                 ` Zhu, Yi
  2010-03-04 14:56                   ` Eric Dumazet
  0 siblings, 1 reply; 21+ messages in thread
From: Zhu, Yi @ 2010-03-04 11:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	Patrick McHardy

Eric Dumazet <eric.dumazet@gmail.com> wrote:

>> Can you show me where sk_drops is used by TCP and what SNMP MIB value
>> should I use for backlog dropping? TCP_MIB_INERRS doesn't seem correct.

> sk_drop is not yet used by TCP, because when backlog processing is
> performed, TCP state machine has much finer grain capabilities to show
> why a packet is dropped. In our backlog drop, we dont examine details of
> packet and drop it at a lower level.

> Please add a new counter, say LINUX_MIB_TCPBACKLOGDROP :

> 3 added lines:
> - one in "include/linux/snmp.h" to define the MIB name
> - one to define its string
> - one to perform the increment when actual drop occurs)

> A good starting point would be to study recent commit
> 72032fdbcde8b333e65b3430e1bcb4358e2d6716 from Jamal
> (xfrm: Introduce LINUX_MIB_XFRMFWDHDRERROR)

> For sk_drop, just increment it and we can get it at generic sock level.

Since neither sk_drops nor the new MIB value are used by TCP currently.
How about I keep the tcp backlog limit patch as is and you implement
the above in another patch?

Thanks,
-yi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V2 2/7] tcp: use limited socket backlog
  2010-03-04 11:04                 ` Zhu, Yi
@ 2010-03-04 14:56                   ` Eric Dumazet
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2010-03-04 14:56 UTC (permalink / raw)
  To: Zhu, Yi
  Cc: netdev, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	Patrick McHardy

Le jeudi 04 mars 2010 à 19:04 +0800, Zhu, Yi a écrit :

> 
> Since neither sk_drops nor the new MIB value are used by TCP currently.
> How about I keep the tcp backlog limit patch as is and you implement
> the above in another patch?

Sure, I will do that



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-03-04 14:56 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-03  8:36 [PATCH V2 1/7] net: add limit for socket backlog Zhu Yi
2010-03-03  8:36 ` [PATCH V2 2/7] tcp: use limited " Zhu Yi
2010-03-03  8:36   ` [PATCH V2 3/7] udp: " Zhu Yi
     [not found]     ` <1267605389-7369-4-git-send-email-yi.zhu@intel.com>
2010-03-03  8:36       ` [PATCH V2 5/7] sctp: " Zhu Yi
2010-03-03  8:36         ` [PATCH V2 6/7] tipc: " Zhu Yi
2010-03-03  8:36           ` [PATCH V2 7/7] net: backlog functions rename Zhu Yi
2010-03-03 14:10         ` [PATCH V2 5/7] sctp: use limited socket backlog Vlad Yasevich
2010-03-03 14:19           ` Zhu, Yi
2010-03-03 14:36             ` Vlad Yasevich
2010-03-04  2:00               ` Zhu, Yi
2010-03-03  8:53   ` [PATCH V2 2/7] tcp: " Eric Dumazet
2010-03-03  9:06     ` Zhu Yi
2010-03-03 10:07       ` Eric Dumazet
2010-03-03 14:12         ` Zhu, Yi
2010-03-03 14:31           ` Eric Dumazet
2010-03-04  5:21             ` Zhu, Yi
2010-03-04  6:00               ` Eric Dumazet
2010-03-04 11:04                 ` Zhu, Yi
2010-03-04 14:56                   ` Eric Dumazet
2010-03-03  8:54 ` [PATCH V2 1/7] net: add limit for " David Miller
2010-03-03  9:01   ` Zhu Yi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.