netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ipv4: Namespaceify tcp_max_orphans knob
@ 2017-09-07  3:10 Haishuang Yan
  2017-09-08 22:13 ` Cong Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Haishuang Yan @ 2017-09-07  3:10 UTC (permalink / raw)
  To: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, Eric Dumazet
  Cc: netdev, linux-kernel, Haishuang Yan

Different namespace application might require different maximal number
of TCP sockets independently of the host.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
 include/net/netns/ipv4.h   |  1 +
 include/net/tcp.h          |  5 +++--
 net/ipv4/sysctl_net_ipv4.c | 14 +++++++-------
 net/ipv4/tcp.c             |  3 ---
 net/ipv4/tcp_input.c       |  1 -
 net/ipv4/tcp_ipv4.c        |  1 +
 6 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 20d061c..305e031 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -127,6 +127,7 @@ struct netns_ipv4 {
 	int sysctl_tcp_timestamps;
 	struct inet_timewait_death_row tcp_death_row;
 	int sysctl_max_syn_backlog;
+	int sysctl_tcp_max_orphans;
 
 #ifdef CONFIG_NET_L3_MASTER_DEV
 	int sysctl_udp_l3mdev_accept;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index b510f28..ac2d998 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -320,10 +320,11 @@ static inline bool tcp_too_many_orphans(struct sock *sk, int shift)
 {
 	struct percpu_counter *ocp = sk->sk_prot->orphan_count;
 	int orphans = percpu_counter_read_positive(ocp);
+	int tcp_max_orphans = sock_net(sk)->ipv4.sysctl_tcp_max_orphans;
 
-	if (orphans << shift > sysctl_tcp_max_orphans) {
+	if (orphans << shift > tcp_max_orphans) {
 		orphans = percpu_counter_sum_positive(ocp);
-		if (orphans << shift > sysctl_tcp_max_orphans)
+		if (orphans << shift > tcp_max_orphans)
 			return true;
 	}
 	return false;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 0d3c038..4f26c8d3 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -394,13 +394,6 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
 		.proc_handler	= proc_dointvec
 	},
 	{
-		.procname	= "tcp_max_orphans",
-		.data		= &sysctl_tcp_max_orphans,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec
-	},
-	{
 		.procname	= "tcp_fastopen",
 		.data		= &sysctl_tcp_fastopen,
 		.maxlen		= sizeof(int),
@@ -1085,6 +1078,13 @@ static int proc_tcp_available_ulp(struct ctl_table *ctl,
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "tcp_max_orphans",
+		.data		= &init_net.ipv4.sysctl_tcp_max_orphans,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	{
 		.procname	= "fib_multipath_use_neigh",
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5091402..39187ac 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3522,9 +3522,6 @@ void __init tcp_init(void)
 	}
 
 
-	cnt = tcp_hashinfo.ehash_mask + 1;
-	sysctl_tcp_max_orphans = cnt / 2;
-
 	tcp_init_mem();
 	/* Set per-socket limits to no more than 1/128 the pressure threshold */
 	limit = nr_free_buffer_pages() << (PAGE_SHIFT - 7);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c5d7656..0230509 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -88,7 +88,6 @@
 
 int sysctl_tcp_stdurg __read_mostly;
 int sysctl_tcp_rfc1337 __read_mostly;
-int sysctl_tcp_max_orphans __read_mostly = NR_FILE;
 int sysctl_tcp_frto __read_mostly = 2;
 int sysctl_tcp_min_rtt_wlen __read_mostly = 300;
 int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a63486a..4b17a91 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2468,6 +2468,7 @@ static int __net_init tcp_sk_init(struct net *net)
 	net->ipv4.tcp_death_row.hashinfo = &tcp_hashinfo;
 
 	net->ipv4.sysctl_max_syn_backlog = max(128, cnt / 256);
+	net->ipv4.sysctl_tcp_max_orphans = cnt / 2;
 	net->ipv4.sysctl_tcp_sack = 1;
 	net->ipv4.sysctl_tcp_window_scaling = 1;
 	net->ipv4.sysctl_tcp_timestamps = 1;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob
  2017-09-07  3:10 [PATCH] ipv4: Namespaceify tcp_max_orphans knob Haishuang Yan
@ 2017-09-08 22:13 ` Cong Wang
  2017-09-09  1:25   ` 严海双
  0 siblings, 1 reply; 7+ messages in thread
From: Cong Wang @ 2017-09-08 22:13 UTC (permalink / raw)
  To: Haishuang Yan
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Eric Dumazet, Linux Kernel Network Developers, LKML

On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
<yanhaishuang@cmss.chinamobile.com> wrote:
> Different namespace application might require different maximal number
> of TCP sockets independently of the host.

So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
in a whole system, right? This just makes OOM easier to trigger.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob
  2017-09-08 22:13 ` Cong Wang
@ 2017-09-09  1:25   ` 严海双
  2017-09-09  4:35     ` Cong Wang
  0 siblings, 1 reply; 7+ messages in thread
From: 严海双 @ 2017-09-09  1:25 UTC (permalink / raw)
  To: Cong Wang
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Eric Dumazet, Linux Kernel Network Developers, LKML



> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
> <yanhaishuang@cmss.chinamobile.com> wrote:
>> Different namespace application might require different maximal number
>> of TCP sockets independently of the host.
> 
> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
> in a whole system, right? This just makes OOM easier to trigger.
> 

>From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
+ ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob
  2017-09-09  1:25   ` 严海双
@ 2017-09-09  4:35     ` Cong Wang
  2017-09-09  5:09       ` 严海双
  0 siblings, 1 reply; 7+ messages in thread
From: Cong Wang @ 2017-09-09  4:35 UTC (permalink / raw)
  To: 严海双
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Eric Dumazet, Linux Kernel Network Developers, LKML

On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <yanhaishuang@cmss.chinamobile.com> wrote:
>
>
>> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>
>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>> <yanhaishuang@cmss.chinamobile.com> wrote:
>>> Different namespace application might require different maximal number
>>> of TCP sockets independently of the host.
>>
>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>> in a whole system, right? This just makes OOM easier to trigger.
>>
>
> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.

Nope, by N I mean the number of containers. Before your patch, the limit
is global, after your patch it is per container.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob
  2017-09-09  4:35     ` Cong Wang
@ 2017-09-09  5:09       ` 严海双
  2017-09-09  5:16         ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: 严海双 @ 2017-09-09  5:09 UTC (permalink / raw)
  To: Cong Wang
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Eric Dumazet, Linux Kernel Network Developers, LKML



> On 2017年9月9日, at 下午12:35, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <yanhaishuang@cmss.chinamobile.com> wrote:
>> 
>> 
>>> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> 
>>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>>> <yanhaishuang@cmss.chinamobile.com> wrote:
>>>> Different namespace application might require different maximal number
>>>> of TCP sockets independently of the host.
>>> 
>>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>>> in a whole system, right? This just makes OOM easier to trigger.
>>> 
>> 
>> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
> 
> Nope, by N I mean the number of containers. Before your patch, the limit
> is global, after your patch it is per container.
> 

Yeah, for example, if there is N containers, before the patch, I mean the limit is:

	N * net->ipv4.sysctl_tcp_max_orphans

After the patch, the limit is:

	ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. net->ipv4.sysctl_tcp_max_orphans + …

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob
  2017-09-09  5:09       ` 严海双
@ 2017-09-09  5:16         ` David Miller
  2017-09-09 10:21           ` 严海双
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2017-09-09  5:16 UTC (permalink / raw)
  To: yanhaishuang
  Cc: xiyou.wangcong, kuznet, yoshfuji, edumazet, netdev, linux-kernel

From: 严海双 <yanhaishuang@cmss.chinamobile.com>
Date: Sat, 9 Sep 2017 13:09:57 +0800

> 
> 
>> On 2017年9月9日, at 下午12:35, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> 
>> On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <yanhaishuang@cmss.chinamobile.com> wrote:
>>> 
>>> 
>>>> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>> 
>>>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>>>> <yanhaishuang@cmss.chinamobile.com> wrote:
>>>>> Different namespace application might require different maximal number
>>>>> of TCP sockets independently of the host.
>>>> 
>>>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>>>> in a whole system, right? This just makes OOM easier to trigger.
>>>> 
>>> 
>>> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
>>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
>>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
>> 
>> Nope, by N I mean the number of containers. Before your patch, the limit
>> is global, after your patch it is per container.
>> 
> 
> Yeah, for example, if there is N containers, before the patch, I mean the limit is:
> 
> 	N * net->ipv4.sysctl_tcp_max_orphans
> 
> After the patch, the limit is:
> 
> 	ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. net->ipv4.sysctl_tcp_max_orphans + …

Not true.

Please remove "N" from your equation of the current situation.

"sysctl_tcp_max_orphans" applies to entire system, it is a global limit,
comparing one limit against all orphans in the system, there is no N.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob
  2017-09-09  5:16         ` David Miller
@ 2017-09-09 10:21           ` 严海双
  0 siblings, 0 replies; 7+ messages in thread
From: 严海双 @ 2017-09-09 10:21 UTC (permalink / raw)
  To: David Miller
  Cc: xiyou.wangcong, kuznet, yoshfuji, edumazet, netdev, linux-kernel



> On 2017年9月9日, at 下午1:16, David Miller <davem@davemloft.net> wrote:
> 
> From: 严海双 <yanhaishuang@cmss.chinamobile.com>
> Date: Sat, 9 Sep 2017 13:09:57 +0800
> 
>> 
>> 
>>> On 2017年9月9日, at 下午12:35, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> 
>>> On Fri, Sep 8, 2017 at 6:25 PM, 严海双 <yanhaishuang@cmss.chinamobile.com> wrote:
>>>> 
>>>> 
>>>>> On 2017年9月9日, at 上午6:13, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>> 
>>>>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>>>>> <yanhaishuang@cmss.chinamobile.com> wrote:
>>>>>> Different namespace application might require different maximal number
>>>>>> of TCP sockets independently of the host.
>>>>> 
>>>>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>>>>> in a whole system, right? This just makes OOM easier to trigger.
>>>>> 
>>>> 
>>>> From my understanding, before the patch, we had N * net->ipv4.sysctl_tcp_max_orphans,
>>>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + ns2.sysctl_tcp_max_orphans
>>>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
>>> 
>>> Nope, by N I mean the number of containers. Before your patch, the limit
>>> is global, after your patch it is per container.
>>> 
>> 
>> Yeah, for example, if there is N containers, before the patch, I mean the limit is:
>> 
>> 	N * net->ipv4.sysctl_tcp_max_orphans
>> 
>> After the patch, the limit is:
>> 
>> 	ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. net->ipv4.sysctl_tcp_max_orphans + …
> 
> Not true.
> 
> Please remove "N" from your equation of the current situation.
> 
> "sysctl_tcp_max_orphans" applies to entire system, it is a global limit,
> comparing one limit against all orphans in the system, there is no N.

Yes, it’s right. I browse the source code and found that it’s a global limit, 
sorry for my mistake.

Thanks David and Cong.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-09-09 10:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-07  3:10 [PATCH] ipv4: Namespaceify tcp_max_orphans knob Haishuang Yan
2017-09-08 22:13 ` Cong Wang
2017-09-09  1:25   ` 严海双
2017-09-09  4:35     ` Cong Wang
2017-09-09  5:09       ` 严海双
2017-09-09  5:16         ` David Miller
2017-09-09 10:21           ` 严海双

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).