linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Make ipmr queue length configurable
@ 2019-07-25 20:42 Brodie Greenfield
  2019-07-25 20:42 ` [PATCH 1/2] ipmr: Make cache " Brodie Greenfield
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Brodie Greenfield @ 2019-07-25 20:42 UTC (permalink / raw)
  To: davem, stephen, kuznet, yoshfuji, netdev
  Cc: linux-kernel, chris.packham, luuk.paulussen, Brodie Greenfield

We want to have some more space in our queue for processing incoming
multicast packets, so we can process more of them without dropping
them prematurely. It is useful to be able to increase this limit on
higher-spec platforms that can handle more items.

For the particular use case here at Allied Telesis, we have linux
running on our switches and routers, with support for the number of
multicast groups being increased. Basically, this queue length affects
the time taken to fully learn all of the multicast streams. 

Changes in v3:
 - Corrected a v4 to v6 typo.

Changes in v2:
 - Tidy up a few unnecessary bits of code.
 - Put the sysctl inside ip multicast ifdef.
 - Included the IPv6 version.

Brodie Greenfield (2):
  ipmr: Make cache queue length configurable
  ip6mr: Make cache queue length configurable

 Documentation/networking/ip-sysctl.txt | 16 ++++++++++++++++
 include/net/netns/ipv4.h               |  1 +
 include/net/netns/ipv6.h               |  1 +
 net/ipv4/af_inet.c                     |  1 +
 net/ipv4/ipmr.c                        |  4 +++-
 net/ipv4/sysctl_net_ipv4.c             |  7 +++++++
 net/ipv6/af_inet6.c                    |  1 +
 net/ipv6/ip6mr.c                       |  4 +++-
 net/ipv6/sysctl_net_ipv6.c             |  7 +++++++
 9 files changed, 40 insertions(+), 2 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] ipmr: Make cache queue length configurable
  2019-07-25 20:42 [PATCH 0/2] Make ipmr queue length configurable Brodie Greenfield
@ 2019-07-25 20:42 ` Brodie Greenfield
  2019-07-26 10:10   ` Stephen Suryaputra
  2019-07-26 11:05   ` Nikolay Aleksandrov
  2019-07-25 20:42 ` [PATCH 2/2] ip6mr: " Brodie Greenfield
  2019-07-27 20:18 ` [PATCH 0/2] Make ipmr " David Miller
  2 siblings, 2 replies; 12+ messages in thread
From: Brodie Greenfield @ 2019-07-25 20:42 UTC (permalink / raw)
  To: davem, stephen, kuznet, yoshfuji, netdev
  Cc: linux-kernel, chris.packham, luuk.paulussen, Brodie Greenfield

We want to be able to keep more spaces available in our queue for
processing incoming multicast traffic (adding (S,G) entries) - this lets
us learn more groups faster, rather than dropping them at this stage.

Signed-off-by: Brodie Greenfield <brodie.greenfield@alliedtelesis.co.nz>
---
 Documentation/networking/ip-sysctl.txt | 8 ++++++++
 include/net/netns/ipv4.h               | 1 +
 net/ipv4/af_inet.c                     | 1 +
 net/ipv4/ipmr.c                        | 4 +++-
 net/ipv4/sysctl_net_ipv4.c             | 7 +++++++
 5 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index acdfb5d2bcaa..02f77e932adf 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -887,6 +887,14 @@ ip_local_reserved_ports - list of comma separated ranges
 
 	Default: Empty
 
+ip_mr_cache_queue_length - INTEGER
+	Limit the number of multicast packets we can have in the queue to be
+	resolved.
+	Bear in mind that when an unresolved multicast packet is received,
+	there is an O(n) traversal of the queue. This should be considered
+	if increasing.
+	Default: 10
+
 ip_unprivileged_port_start - INTEGER
 	This is a per-namespace sysctl.  It defines the first
 	unprivileged port in the network namespace.  Privileged ports
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 104a6669e344..3411d3f18d51 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -187,6 +187,7 @@ struct netns_ipv4 {
 	int sysctl_igmp_max_msf;
 	int sysctl_igmp_llm_reports;
 	int sysctl_igmp_qrv;
+	unsigned int sysctl_ip_mr_cache_queue_length;
 
 	struct ping_group_range ping_group_range;
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 0dfb72c46671..8e25538bdb1e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1827,6 +1827,7 @@ static __net_init int inet_init_net(struct net *net)
 	net->ipv4.sysctl_igmp_llm_reports = 1;
 	net->ipv4.sysctl_igmp_qrv = 2;
 
+	net->ipv4.sysctl_ip_mr_cache_queue_length = 10;
 	return 0;
 }
 
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index ddbf8c9a1abb..c6a6c3e453a9 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1127,6 +1127,7 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
 				 struct sk_buff *skb, struct net_device *dev)
 {
 	const struct iphdr *iph = ip_hdr(skb);
+	struct net *net = dev_net(dev);
 	struct mfc_cache *c;
 	bool found = false;
 	int err;
@@ -1142,7 +1143,8 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
 
 	if (!found) {
 		/* Create a new entry if allowable */
-		if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
+		if (atomic_read(&mrt->cache_resolve_queue_len) >=
+		    net->ipv4.sysctl_ip_mr_cache_queue_length ||
 		    (c = ipmr_cache_alloc_unres()) == NULL) {
 			spin_unlock_bh(&mfc_unres_lock);
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index ba0fc4b18465..78ae86e8c6cb 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -784,6 +784,13 @@ static struct ctl_table ipv4_net_table[] = {
 		.proc_handler	= proc_dointvec
 	},
 #ifdef CONFIG_IP_MULTICAST
+	{
+		.procname	= "ip_mr_cache_queue_length",
+		.data		= &init_net.ipv4.sysctl_ip_mr_cache_queue_length,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 	{
 		.procname	= "igmp_qrv",
 		.data		= &init_net.ipv4.sysctl_igmp_qrv,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] ip6mr: Make cache queue length configurable
  2019-07-25 20:42 [PATCH 0/2] Make ipmr queue length configurable Brodie Greenfield
  2019-07-25 20:42 ` [PATCH 1/2] ipmr: Make cache " Brodie Greenfield
@ 2019-07-25 20:42 ` Brodie Greenfield
  2019-07-26 10:10   ` Stephen Suryaputra
  2019-07-27 20:18 ` [PATCH 0/2] Make ipmr " David Miller
  2 siblings, 1 reply; 12+ messages in thread
From: Brodie Greenfield @ 2019-07-25 20:42 UTC (permalink / raw)
  To: davem, stephen, kuznet, yoshfuji, netdev
  Cc: linux-kernel, chris.packham, luuk.paulussen, Brodie Greenfield

We want to be able to keep more spaces available in our queue for
processing incoming IPv6 multicast traffic (adding (S,G) entries) - this
lets us learn more groups faster, rather than dropping them at this stage.

Signed-off-by: Brodie Greenfield <brodie.greenfield@alliedtelesis.co.nz>
---
 Documentation/networking/ip-sysctl.txt | 8 ++++++++
 include/net/netns/ipv6.h               | 1 +
 net/ipv6/af_inet6.c                    | 1 +
 net/ipv6/ip6mr.c                       | 4 +++-
 net/ipv6/sysctl_net_ipv6.c             | 7 +++++++
 5 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 02f77e932adf..68eada3ca915 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1481,6 +1481,14 @@ skip_notify_on_dev_down - BOOLEAN
 	on userspace caches to track link events and evict routes.
 	Default: false (generate message)
 
+ip6_mr_cache_queue_length - INTEGER
+	Limit the number of multicast packets we can have in the queue to be
+	resolved.
+	Bear in mind that when an unresolved multicast packet is received,
+	there is an O(n) traversal of the queue. This should be considered
+	if increasing.
+	Default: 10
+
 IPv6 Fragmentation:
 
 ip6frag_high_thresh - INTEGER
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index ef1ed529f33c..84b58424c799 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -46,6 +46,7 @@ struct netns_sysctl_ipv6 {
 	int max_hbh_opts_len;
 	int seg6_flowlabel;
 	bool skip_notify_on_dev_down;
+	unsigned int ip6_mr_cache_queue_length;
 };
 
 struct netns_ipv6 {
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index d99753b5e39b..6551bb63e5a2 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -856,6 +856,7 @@ static int __net_init inet6_net_init(struct net *net)
 	net->ipv6.sysctl.max_hbh_opts_cnt = IP6_DEFAULT_MAX_HBH_OPTS_CNT;
 	net->ipv6.sysctl.max_dst_opts_len = IP6_DEFAULT_MAX_DST_OPTS_LEN;
 	net->ipv6.sysctl.max_hbh_opts_len = IP6_DEFAULT_MAX_HBH_OPTS_LEN;
+	net->ipv6.sysctl.ip6_mr_cache_queue_length = 10;
 	atomic_set(&net->ipv6.fib6_sernum, 1);
 
 	err = ipv6_init_mibs(net);
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index cc01aa3f2b5e..bb445871437e 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1135,6 +1135,7 @@ static int ip6mr_cache_report(struct mr_table *mrt, struct sk_buff *pkt,
 static int ip6mr_cache_unresolved(struct mr_table *mrt, mifi_t mifi,
 				  struct sk_buff *skb, struct net_device *dev)
 {
+	struct net *net = dev_net(dev);
 	struct mfc6_cache *c;
 	bool found = false;
 	int err;
@@ -1153,7 +1154,8 @@ static int ip6mr_cache_unresolved(struct mr_table *mrt, mifi_t mifi,
 		 *	Create a new entry if allowable
 		 */
 
-		if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
+		if (atomic_read(&mrt->cache_resolve_queue_len) >=
+		    net->ipv6.sysctl.ip6_mr_cache_queue_length ||
 		    (c = ip6mr_cache_alloc_unres()) == NULL) {
 			spin_unlock_bh(&mfc_unres_lock);
 
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index e15cd37024fd..a27299d4cc34 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+	{
+		.procname	= "ip6_mr_cache_queue_length",
+		.data		= &init_net.ipv6.sysctl.ip6_mr_cache_queue_length,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 	{ }
 };
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] ipmr: Make cache queue length configurable
  2019-07-25 20:42 ` [PATCH 1/2] ipmr: Make cache " Brodie Greenfield
@ 2019-07-26 10:10   ` Stephen Suryaputra
  2019-07-26 11:05   ` Nikolay Aleksandrov
  1 sibling, 0 replies; 12+ messages in thread
From: Stephen Suryaputra @ 2019-07-26 10:10 UTC (permalink / raw)
  To: Brodie Greenfield
  Cc: davem, stephen, kuznet, yoshfuji, netdev, linux-kernel,
	chris.packham, luuk.paulussen

On Fri, Jul 26, 2019 at 08:42:29AM +1200, Brodie Greenfield wrote:
> We want to be able to keep more spaces available in our queue for
> processing incoming multicast traffic (adding (S,G) entries) - this lets
> us learn more groups faster, rather than dropping them at this stage.
> 
> Signed-off-by: Brodie Greenfield <brodie.greenfield@alliedtelesis.co.nz>

Our system can use this. The patch applied cleanly to my net-next
sandbox. Thank you.

Reviewed-by: Stephen Suryaputra <ssuryaextr@gmail.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] ip6mr: Make cache queue length configurable
  2019-07-25 20:42 ` [PATCH 2/2] ip6mr: " Brodie Greenfield
@ 2019-07-26 10:10   ` Stephen Suryaputra
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Suryaputra @ 2019-07-26 10:10 UTC (permalink / raw)
  To: Brodie Greenfield
  Cc: davem, stephen, kuznet, yoshfuji, netdev, linux-kernel,
	chris.packham, luuk.paulussen

On Fri, Jul 26, 2019 at 08:42:30AM +1200, Brodie Greenfield wrote:
> We want to be able to keep more spaces available in our queue for
> processing incoming IPv6 multicast traffic (adding (S,G) entries) - this
> lets us learn more groups faster, rather than dropping them at this stage.
> 
> Signed-off-by: Brodie Greenfield <brodie.greenfield@alliedtelesis.co.nz>

Reviewed-by: Stephen Suryaputra <ssuryaextr@gmail.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] ipmr: Make cache queue length configurable
  2019-07-25 20:42 ` [PATCH 1/2] ipmr: Make cache " Brodie Greenfield
  2019-07-26 10:10   ` Stephen Suryaputra
@ 2019-07-26 11:05   ` Nikolay Aleksandrov
  2019-07-26 11:15     ` Nikolay Aleksandrov
  1 sibling, 1 reply; 12+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-26 11:05 UTC (permalink / raw)
  To: Brodie Greenfield, davem, stephen, kuznet, yoshfuji, netdev
  Cc: linux-kernel, chris.packham, luuk.paulussen

On 25/07/2019 23:42, Brodie Greenfield wrote:
> We want to be able to keep more spaces available in our queue for
> processing incoming multicast traffic (adding (S,G) entries) - this lets
> us learn more groups faster, rather than dropping them at this stage.
> 
> Signed-off-by: Brodie Greenfield <brodie.greenfield@alliedtelesis.co.nz>
> ---
>  Documentation/networking/ip-sysctl.txt | 8 ++++++++
>  include/net/netns/ipv4.h               | 1 +
>  net/ipv4/af_inet.c                     | 1 +
>  net/ipv4/ipmr.c                        | 4 +++-
>  net/ipv4/sysctl_net_ipv4.c             | 7 +++++++
>  5 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index acdfb5d2bcaa..02f77e932adf 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -887,6 +887,14 @@ ip_local_reserved_ports - list of comma separated ranges
>  
>  	Default: Empty
>  
> +ip_mr_cache_queue_length - INTEGER
> +	Limit the number of multicast packets we can have in the queue to be
> +	resolved.
> +	Bear in mind that when an unresolved multicast packet is received,
> +	there is an O(n) traversal of the queue. This should be considered
> +	if increasing.
> +	Default: 10
> +

Hi,
You've said it yourself - it has linear traversal time, but doesn't this patch allow any netns on the
system to increase its limit to any value, thus possibly affecting others ?
Though the socket limit will kick in at some point. I think that's where David
was going with his suggestion back in 2018:
https://www.spinics.net/lists/netdev/msg514543.html

If we add this sysctl now, we'll be stuck with it. I'd prefer David's suggestion
so we can rely only on the receive queue queue limit which is already configurable. 
We still need to be careful with the defaults though, the NOCACHE entry is 128 bytes
and with the skb overhead currently on my setup we end up at about 277 entries default limit.

Cheers,
 Nik

>  ip_unprivileged_port_start - INTEGER
>  	This is a per-namespace sysctl.  It defines the first
>  	unprivileged port in the network namespace.  Privileged ports
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index 104a6669e344..3411d3f18d51 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
> @@ -187,6 +187,7 @@ struct netns_ipv4 {
>  	int sysctl_igmp_max_msf;
>  	int sysctl_igmp_llm_reports;
>  	int sysctl_igmp_qrv;
> +	unsigned int sysctl_ip_mr_cache_queue_length;
>  
>  	struct ping_group_range ping_group_range;
>  
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 0dfb72c46671..8e25538bdb1e 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1827,6 +1827,7 @@ static __net_init int inet_init_net(struct net *net)
>  	net->ipv4.sysctl_igmp_llm_reports = 1;
>  	net->ipv4.sysctl_igmp_qrv = 2;
>  
> +	net->ipv4.sysctl_ip_mr_cache_queue_length = 10;
>  	return 0;
>  }
>  
> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
> index ddbf8c9a1abb..c6a6c3e453a9 100644
> --- a/net/ipv4/ipmr.c
> +++ b/net/ipv4/ipmr.c
> @@ -1127,6 +1127,7 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
>  				 struct sk_buff *skb, struct net_device *dev)
>  {
>  	const struct iphdr *iph = ip_hdr(skb);
> +	struct net *net = dev_net(dev);
>  	struct mfc_cache *c;
>  	bool found = false;
>  	int err;
> @@ -1142,7 +1143,8 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
>  
>  	if (!found) {
>  		/* Create a new entry if allowable */
> -		if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
> +		if (atomic_read(&mrt->cache_resolve_queue_len) >=
> +		    net->ipv4.sysctl_ip_mr_cache_queue_length ||
>  		    (c = ipmr_cache_alloc_unres()) == NULL) {
>  			spin_unlock_bh(&mfc_unres_lock);
>  
> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
> index ba0fc4b18465..78ae86e8c6cb 100644
> --- a/net/ipv4/sysctl_net_ipv4.c
> +++ b/net/ipv4/sysctl_net_ipv4.c
> @@ -784,6 +784,13 @@ static struct ctl_table ipv4_net_table[] = {
>  		.proc_handler	= proc_dointvec
>  	},
>  #ifdef CONFIG_IP_MULTICAST
> +	{
> +		.procname	= "ip_mr_cache_queue_length",
> +		.data		= &init_net.ipv4.sysctl_ip_mr_cache_queue_length,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec
> +	},
>  	{
>  		.procname	= "igmp_qrv",
>  		.data		= &init_net.ipv4.sysctl_igmp_qrv,
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] ipmr: Make cache queue length configurable
  2019-07-26 11:05   ` Nikolay Aleksandrov
@ 2019-07-26 11:15     ` Nikolay Aleksandrov
  2019-07-27 17:03       ` Stephen Suryaputra
  0 siblings, 1 reply; 12+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-26 11:15 UTC (permalink / raw)
  To: Brodie Greenfield, davem, stephen, kuznet, yoshfuji, netdev
  Cc: linux-kernel, chris.packham, luuk.paulussen

On 26/07/2019 14:05, Nikolay Aleksandrov wrote:
> On 25/07/2019 23:42, Brodie Greenfield wrote:
>> We want to be able to keep more spaces available in our queue for
>> processing incoming multicast traffic (adding (S,G) entries) - this lets
>> us learn more groups faster, rather than dropping them at this stage.
>>
>> Signed-off-by: Brodie Greenfield <brodie.greenfield@alliedtelesis.co.nz>
>> ---
>>  Documentation/networking/ip-sysctl.txt | 8 ++++++++
>>  include/net/netns/ipv4.h               | 1 +
>>  net/ipv4/af_inet.c                     | 1 +
>>  net/ipv4/ipmr.c                        | 4 +++-
>>  net/ipv4/sysctl_net_ipv4.c             | 7 +++++++
>>  5 files changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
>> index acdfb5d2bcaa..02f77e932adf 100644
>> --- a/Documentation/networking/ip-sysctl.txt
>> +++ b/Documentation/networking/ip-sysctl.txt
>> @@ -887,6 +887,14 @@ ip_local_reserved_ports - list of comma separated ranges
>>  
>>  	Default: Empty
>>  
>> +ip_mr_cache_queue_length - INTEGER
>> +	Limit the number of multicast packets we can have in the queue to be
>> +	resolved.
>> +	Bear in mind that when an unresolved multicast packet is received,
>> +	there is an O(n) traversal of the queue. This should be considered
>> +	if increasing.
>> +	Default: 10
>> +
> 
> Hi,
> You've said it yourself - it has linear traversal time, but doesn't this patch allow any netns on the
> system to increase its limit to any value, thus possibly affecting others ?
> Though the socket limit will kick in at some point. I think that's where David
> was going with his suggestion back in 2018:
> https://www.spinics.net/lists/netdev/msg514543.html
> 
> If we add this sysctl now, we'll be stuck with it. I'd prefer David's suggestion
> so we can rely only on the receive queue queue limit which is already configurable. 
> We still need to be careful with the defaults though, the NOCACHE entry is 128 bytes
> and with the skb overhead currently on my setup we end up at about 277 entries default limit.

I mean that people might be surprised if they increased that limit by default, that's the
only problem I'm not sure how to handle. Maybe we need some hard limit anyway.
Have you done any tests what value works for your setup ?

In the end we might have to go with this patch, but perhaps limit the per-netns sysctl
to the init_ns value as maximum (similar to what we did for frags) or don't make it per-netns
at all.

> 
> Cheers,
>  Nik
> 
>>  ip_unprivileged_port_start - INTEGER
>>  	This is a per-namespace sysctl.  It defines the first
>>  	unprivileged port in the network namespace.  Privileged ports
>> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
>> index 104a6669e344..3411d3f18d51 100644
>> --- a/include/net/netns/ipv4.h
>> +++ b/include/net/netns/ipv4.h
>> @@ -187,6 +187,7 @@ struct netns_ipv4 {
>>  	int sysctl_igmp_max_msf;
>>  	int sysctl_igmp_llm_reports;
>>  	int sysctl_igmp_qrv;
>> +	unsigned int sysctl_ip_mr_cache_queue_length;
>>  
>>  	struct ping_group_range ping_group_range;
>>  
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index 0dfb72c46671..8e25538bdb1e 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -1827,6 +1827,7 @@ static __net_init int inet_init_net(struct net *net)
>>  	net->ipv4.sysctl_igmp_llm_reports = 1;
>>  	net->ipv4.sysctl_igmp_qrv = 2;
>>  
>> +	net->ipv4.sysctl_ip_mr_cache_queue_length = 10;
>>  	return 0;
>>  }
>>  
>> diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
>> index ddbf8c9a1abb..c6a6c3e453a9 100644
>> --- a/net/ipv4/ipmr.c
>> +++ b/net/ipv4/ipmr.c
>> @@ -1127,6 +1127,7 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
>>  				 struct sk_buff *skb, struct net_device *dev)
>>  {
>>  	const struct iphdr *iph = ip_hdr(skb);
>> +	struct net *net = dev_net(dev);
>>  	struct mfc_cache *c;
>>  	bool found = false;
>>  	int err;
>> @@ -1142,7 +1143,8 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
>>  
>>  	if (!found) {
>>  		/* Create a new entry if allowable */
>> -		if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
>> +		if (atomic_read(&mrt->cache_resolve_queue_len) >=
>> +		    net->ipv4.sysctl_ip_mr_cache_queue_length ||
>>  		    (c = ipmr_cache_alloc_unres()) == NULL) {
>>  			spin_unlock_bh(&mfc_unres_lock);
>>  
>> diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
>> index ba0fc4b18465..78ae86e8c6cb 100644
>> --- a/net/ipv4/sysctl_net_ipv4.c
>> +++ b/net/ipv4/sysctl_net_ipv4.c
>> @@ -784,6 +784,13 @@ static struct ctl_table ipv4_net_table[] = {
>>  		.proc_handler	= proc_dointvec
>>  	},
>>  #ifdef CONFIG_IP_MULTICAST
>> +	{
>> +		.procname	= "ip_mr_cache_queue_length",
>> +		.data		= &init_net.ipv4.sysctl_ip_mr_cache_queue_length,
>> +		.maxlen		= sizeof(int),
>> +		.mode		= 0644,
>> +		.proc_handler	= proc_dointvec
>> +	},
>>  	{
>>  		.procname	= "igmp_qrv",
>>  		.data		= &init_net.ipv4.sysctl_igmp_qrv,
>>
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] ipmr: Make cache queue length configurable
  2019-07-26 11:15     ` Nikolay Aleksandrov
@ 2019-07-27 17:03       ` Stephen Suryaputra
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Suryaputra @ 2019-07-27 17:03 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Brodie Greenfield, David Miller, Stephen Hemminger, kuznet,
	yoshfuji, netdev, linux-kernel, chris.packham, luuk.paulussen

On Fri, Jul 26, 2019 at 7:18 AM Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:

> > You've said it yourself - it has linear traversal time, but doesn't this patch allow any netns on the
> > system to increase its limit to any value, thus possibly affecting others ?
> > Though the socket limit will kick in at some point. I think that's where David
> > was going with his suggestion back in 2018:
> > https://www.spinics.net/lists/netdev/msg514543.html
> >
> > If we add this sysctl now, we'll be stuck with it. I'd prefer David's suggestion
> > so we can rely only on the receive queue queue limit which is already configurable.
> > We still need to be careful with the defaults though, the NOCACHE entry is 128 bytes
> > and with the skb overhead currently on my setup we end up at about 277 entries default limit.
>
> I mean that people might be surprised if they increased that limit by default, that's the
> only problem I'm not sure how to handle. Maybe we need some hard limit anyway.
> Have you done any tests what value works for your setup ?

FYI: for ours, it is 2048.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] Make ipmr queue length configurable
  2019-07-25 20:42 [PATCH 0/2] Make ipmr queue length configurable Brodie Greenfield
  2019-07-25 20:42 ` [PATCH 1/2] ipmr: Make cache " Brodie Greenfield
  2019-07-25 20:42 ` [PATCH 2/2] ip6mr: " Brodie Greenfield
@ 2019-07-27 20:18 ` David Miller
  2 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2019-07-27 20:18 UTC (permalink / raw)
  To: brodie.greenfield
  Cc: stephen, kuznet, yoshfuji, netdev, linux-kernel, chris.packham,
	luuk.paulussen

From: Brodie Greenfield <brodie.greenfield@alliedtelesis.co.nz>
Date: Fri, 26 Jul 2019 08:42:28 +1200

> We want to have some more space in our queue for processing incoming
> multicast packets, so we can process more of them without dropping
> them prematurely. It is useful to be able to increase this limit on
> higher-spec platforms that can handle more items.
> 
> For the particular use case here at Allied Telesis, we have linux
> running on our switches and routers, with support for the number of
> multicast groups being increased. Basically, this queue length affects
> the time taken to fully learn all of the multicast streams. 
> 
> Changes in v3:
>  - Corrected a v4 to v6 typo.

As others have voiced, I think it's dangerous to let every netns
increase this so readily.

We need to either put in a non-initns limit or simply not allow
non-init namespaces to change this.

But really socket queue limits are a better place to enforce this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] ipmr: Make cache queue length configurable
  2019-03-06 20:19 ` [PATCH 1/2] ipmr: Make cache " Brodie Greenfield
@ 2019-03-07 15:40   ` Stephen Hemminger
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2019-03-07 15:40 UTC (permalink / raw)
  To: Brodie Greenfield
  Cc: davem, kuznet, yoshfuji, netdev, linux-kernel, chris.packham,
	luuk.paulussen

On Thu,  7 Mar 2019 09:19:55 +1300
Brodie Greenfield <brodie.greenfield@alliedtelesis.co.nz> wrote:

> +ip_mr_cache_queue_length - INTEGER
> +	Limit the number of multicast packets we can have in the queue to be
> +	resolved.
> +	Bear in mind that when an unresolved multicast packet is received,
> +	there is an O(n) traversal of the queue. This should be considered
> +	if increasing.
> +

Why not make it to a unsigned value? A negative value doesn't make
much sense here.

Although other sysctl values date back to a time when Linux was
sloppy about allowing negative values, it would be good to use unsigned
now.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] ipmr: Make cache queue length configurable
  2019-03-07  4:57 Brodie Greenfield
@ 2019-03-07  4:57 ` Brodie Greenfield
  0 siblings, 0 replies; 12+ messages in thread
From: Brodie Greenfield @ 2019-03-07  4:57 UTC (permalink / raw)
  To: davem, stephen, kuznet, yoshfuji, netdev
  Cc: linux-kernel, chris.packham, luuk.paulussen, Brodie Greenfield

We want to be able to keep more spaces available in our queue for
processing incoming multicast traffic (adding (S,G) entries) - this lets
us learn more groups faster, rather than dropping them at this stage.

Signed-off-by: Brodie Greenfield <brodie.greenfield@alliedtelesis.co.nz>
---
 Documentation/networking/ip-sysctl.txt | 8 ++++++++
 include/net/netns/ipv4.h               | 1 +
 net/ipv4/af_inet.c                     | 1 +
 net/ipv4/ipmr.c                        | 4 +++-
 net/ipv4/sysctl_net_ipv4.c             | 7 +++++++
 5 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index acdfb5d2bcaa..02f77e932adf 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -887,6 +887,14 @@ ip_local_reserved_ports - list of comma separated ranges
 
 	Default: Empty
 
+ip_mr_cache_queue_length - INTEGER
+	Limit the number of multicast packets we can have in the queue to be
+	resolved.
+	Bear in mind that when an unresolved multicast packet is received,
+	there is an O(n) traversal of the queue. This should be considered
+	if increasing.
+	Default: 10
+
 ip_unprivileged_port_start - INTEGER
 	This is a per-namespace sysctl.  It defines the first
 	unprivileged port in the network namespace.  Privileged ports
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 104a6669e344..3411d3f18d51 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -187,6 +187,7 @@ struct netns_ipv4 {
 	int sysctl_igmp_max_msf;
 	int sysctl_igmp_llm_reports;
 	int sysctl_igmp_qrv;
+	unsigned int sysctl_ip_mr_cache_queue_length;
 
 	struct ping_group_range ping_group_range;
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 0dfb72c46671..8e25538bdb1e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1827,6 +1827,7 @@ static __net_init int inet_init_net(struct net *net)
 	net->ipv4.sysctl_igmp_llm_reports = 1;
 	net->ipv4.sysctl_igmp_qrv = 2;
 
+	net->ipv4.sysctl_ip_mr_cache_queue_length = 10;
 	return 0;
 }
 
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index ddbf8c9a1abb..c6a6c3e453a9 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1127,6 +1127,7 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
 				 struct sk_buff *skb, struct net_device *dev)
 {
 	const struct iphdr *iph = ip_hdr(skb);
+	struct net *net = dev_net(dev);
 	struct mfc_cache *c;
 	bool found = false;
 	int err;
@@ -1142,7 +1143,8 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
 
 	if (!found) {
 		/* Create a new entry if allowable */
-		if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
+		if (atomic_read(&mrt->cache_resolve_queue_len) >=
+		    net->ipv4.sysctl_ip_mr_cache_queue_length ||
 		    (c = ipmr_cache_alloc_unres()) == NULL) {
 			spin_unlock_bh(&mfc_unres_lock);
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index ba0fc4b18465..78ae86e8c6cb 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -784,6 +784,13 @@ static struct ctl_table ipv4_net_table[] = {
 		.proc_handler	= proc_dointvec
 	},
 #ifdef CONFIG_IP_MULTICAST
+	{
+		.procname	= "ip_mr_cache_queue_length",
+		.data		= &init_net.ipv4.sysctl_ip_mr_cache_queue_length,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 	{
 		.procname	= "igmp_qrv",
 		.data		= &init_net.ipv4.sysctl_igmp_qrv,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 1/2] ipmr: Make cache queue length configurable
  2019-03-06 20:19 [PATCH 0/2] Make ipmr " Brodie Greenfield
@ 2019-03-06 20:19 ` Brodie Greenfield
  2019-03-07 15:40   ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: Brodie Greenfield @ 2019-03-06 20:19 UTC (permalink / raw)
  To: davem, stephen, kuznet, yoshfuji, netdev
  Cc: linux-kernel, chris.packham, luuk.paulussen, Brodie Greenfield

We want to be able to keep more spaces available in our queue for
processing incoming multicast traffic (adding (S,G) entries) - this lets
us learn more groups faster, rather than dropping them at this stage.

Signed-off-by: Brodie Greenfield <brodie.greenfield@alliedtelesis.co.nz>
---
 Documentation/networking/ip-sysctl.txt | 8 ++++++++
 include/net/netns/ipv4.h               | 1 +
 net/ipv4/af_inet.c                     | 1 +
 net/ipv4/ipmr.c                        | 4 +++-
 net/ipv4/sysctl_net_ipv4.c             | 7 +++++++
 5 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index acdfb5d2bcaa..02f77e932adf 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -887,6 +887,14 @@ ip_local_reserved_ports - list of comma separated ranges
 
 	Default: Empty
 
+ip_mr_cache_queue_length - INTEGER
+	Limit the number of multicast packets we can have in the queue to be
+	resolved.
+	Bear in mind that when an unresolved multicast packet is received,
+	there is an O(n) traversal of the queue. This should be considered
+	if increasing.
+	Default: 10
+
 ip_unprivileged_port_start - INTEGER
 	This is a per-namespace sysctl.  It defines the first
 	unprivileged port in the network namespace.  Privileged ports
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 104a6669e344..3411d3f18d51 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -187,6 +187,7 @@ struct netns_ipv4 {
 	int sysctl_igmp_max_msf;
 	int sysctl_igmp_llm_reports;
 	int sysctl_igmp_qrv;
+	unsigned int sysctl_ip_mr_cache_queue_length;
 
 	struct ping_group_range ping_group_range;
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 0dfb72c46671..8e25538bdb1e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1827,6 +1827,7 @@ static __net_init int inet_init_net(struct net *net)
 	net->ipv4.sysctl_igmp_llm_reports = 1;
 	net->ipv4.sysctl_igmp_qrv = 2;
 
+	net->ipv4.sysctl_ip_mr_cache_queue_length = 10;
 	return 0;
 }
 
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index ddbf8c9a1abb..c6a6c3e453a9 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1127,6 +1127,7 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
 				 struct sk_buff *skb, struct net_device *dev)
 {
 	const struct iphdr *iph = ip_hdr(skb);
+	struct net *net = dev_net(dev);
 	struct mfc_cache *c;
 	bool found = false;
 	int err;
@@ -1142,7 +1143,8 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
 
 	if (!found) {
 		/* Create a new entry if allowable */
-		if (atomic_read(&mrt->cache_resolve_queue_len) >= 10 ||
+		if (atomic_read(&mrt->cache_resolve_queue_len) >=
+		    net->ipv4.sysctl_ip_mr_cache_queue_length ||
 		    (c = ipmr_cache_alloc_unres()) == NULL) {
 			spin_unlock_bh(&mfc_unres_lock);
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index ba0fc4b18465..78ae86e8c6cb 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -784,6 +784,13 @@ static struct ctl_table ipv4_net_table[] = {
 		.proc_handler	= proc_dointvec
 	},
 #ifdef CONFIG_IP_MULTICAST
+	{
+		.procname	= "ip_mr_cache_queue_length",
+		.data		= &init_net.ipv4.sysctl_ip_mr_cache_queue_length,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
 	{
 		.procname	= "igmp_qrv",
 		.data		= &init_net.ipv4.sysctl_igmp_qrv,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-07-27 20:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-25 20:42 [PATCH 0/2] Make ipmr queue length configurable Brodie Greenfield
2019-07-25 20:42 ` [PATCH 1/2] ipmr: Make cache " Brodie Greenfield
2019-07-26 10:10   ` Stephen Suryaputra
2019-07-26 11:05   ` Nikolay Aleksandrov
2019-07-26 11:15     ` Nikolay Aleksandrov
2019-07-27 17:03       ` Stephen Suryaputra
2019-07-25 20:42 ` [PATCH 2/2] ip6mr: " Brodie Greenfield
2019-07-26 10:10   ` Stephen Suryaputra
2019-07-27 20:18 ` [PATCH 0/2] Make ipmr " David Miller
  -- strict thread matches above, loose matches on Subject: below --
2019-03-07  4:57 Brodie Greenfield
2019-03-07  4:57 ` [PATCH 1/2] ipmr: Make cache " Brodie Greenfield
2019-03-06 20:19 [PATCH 0/2] Make ipmr " Brodie Greenfield
2019-03-06 20:19 ` [PATCH 1/2] ipmr: Make cache " Brodie Greenfield
2019-03-07 15:40   ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).