linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] tuntap: reduce memory using of queues
@ 2013-01-23 13:59 Jason Wang
  2013-01-23 13:59 ` [PATCH 2/2] tuntap: limit the number of flow caches Jason Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jason Wang @ 2013-01-23 13:59 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel
  Cc: Jason Wang, Eric Dumazet, David Woodhouse

A MAX_TAP_QUEUES(1024) queues of tuntap device is always allocated
unconditionally even userspace only requires a single queue device. This is
unnecessary and will lead a very high order of page allocation when has a high
possibility to fail. Solving this by creating a one queue net device when
userspace only use one queue and also reduce MAX_TAP_QUEUES to
DEFAULT_MAX_NUM_RSS_QUEUES which can guarantee the success of
the allocation.

Reported-by: Dirk Hohndel <dirk@hohndel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index c81680d..8939d21 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -109,11 +109,10 @@ struct tap_filter {
 	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
 };
 
-/* 1024 is probably a high enough limit: modern hypervisors seem to support on
- * the order of 100-200 CPUs so this leaves us some breathing space if we want
- * to match a queue per guest CPU.
- */
-#define MAX_TAP_QUEUES 1024
+/* DEFAULT_MAX_NUM_RSS_QUEUES were choosed to let the rx/tx queues allocated for
+ * the netdevice to be fit in one page. So we can make sure the success of
+ * memory allocation. TODO: increase the limit. */
+#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
 
 #define TUN_FLOW_EXPIRE (3 * HZ)
 
@@ -1583,6 +1582,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	else {
 		char *name;
 		unsigned long flags = 0;
+		int queues = ifr->ifr_flags & IFF_MULTI_QUEUE ?
+			     MAX_TAP_QUEUES : 1;
 
 		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
@@ -1606,8 +1607,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 			name = ifr->ifr_name;
 
 		dev = alloc_netdev_mqs(sizeof(struct tun_struct), name,
-				       tun_setup,
-				       MAX_TAP_QUEUES, MAX_TAP_QUEUES);
+				       tun_setup, queues, queues);
+
 		if (!dev)
 			return -ENOMEM;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] tuntap: limit the number of flow caches
  2013-01-23 13:59 [PATCH 1/2] tuntap: reduce memory using of queues Jason Wang
@ 2013-01-23 13:59 ` Jason Wang
  2013-01-23 18:50   ` David Miller
  2013-01-23 15:16 ` [PATCH 1/2] tuntap: reduce memory using of queues Michael S. Tsirkin
  2013-01-23 18:47 ` David Miller
  2 siblings, 1 reply; 6+ messages in thread
From: Jason Wang @ 2013-01-23 13:59 UTC (permalink / raw)
  To: davem, mst, netdev, linux-kernel; +Cc: Jason Wang, Stephen Hemminger

We create new flow caches when a new flow is identified by tuntap, This may lead
some issues:

- userspace may produce a huge amount of short live flows to exhaust host memory
- the unlimited number of flow caches may produce a long list which increase the
  time in the linear searching

Solve this by introducing a limit of total number of flow caches.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8939d21..cc09b67 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -113,6 +113,7 @@ struct tap_filter {
  * the netdevice to be fit in one page. So we can make sure the success of
  * memory allocation. TODO: increase the limit. */
 #define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
+#define MAX_TAP_FLOWS  4096
 
 #define TUN_FLOW_EXPIRE (3 * HZ)
 
@@ -185,6 +186,7 @@ struct tun_struct {
 	unsigned int numdisabled;
 	struct list_head disabled;
 	void *security;
+	u32 flow_count;
 };
 
 static inline u32 tun_hashfn(u32 rxhash)
@@ -218,6 +220,7 @@ static struct tun_flow_entry *tun_flow_create(struct tun_struct *tun,
 		e->queue_index = queue_index;
 		e->tun = tun;
 		hlist_add_head_rcu(&e->hash_link, head);
+		++tun->flow_count;
 	}
 	return e;
 }
@@ -228,6 +231,7 @@ static void tun_flow_delete(struct tun_struct *tun, struct tun_flow_entry *e)
 		  e->rxhash, e->queue_index);
 	hlist_del_rcu(&e->hash_link);
 	kfree_rcu(e, rcu);
+	--tun->flow_count;
 }
 
 static void tun_flow_flush(struct tun_struct *tun)
@@ -317,7 +321,8 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
 		e->updated = jiffies;
 	} else {
 		spin_lock_bh(&tun->lock);
-		if (!tun_flow_find(head, rxhash))
+		if (!tun_flow_find(head, rxhash) &&
+		    tun->flow_count < MAX_TAP_FLOWS)
 			tun_flow_create(tun, head, rxhash, queue_index);
 
 		if (!timer_pending(&tun->flow_gc_timer))
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] tuntap: reduce memory using of queues
  2013-01-23 13:59 [PATCH 1/2] tuntap: reduce memory using of queues Jason Wang
  2013-01-23 13:59 ` [PATCH 2/2] tuntap: limit the number of flow caches Jason Wang
@ 2013-01-23 15:16 ` Michael S. Tsirkin
  2013-01-23 18:47 ` David Miller
  2 siblings, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2013-01-23 15:16 UTC (permalink / raw)
  To: Jason Wang; +Cc: davem, netdev, linux-kernel, Eric Dumazet, David Woodhouse

On Wed, Jan 23, 2013 at 09:59:12PM +0800, Jason Wang wrote:
> A MAX_TAP_QUEUES(1024) queues of tuntap device is always allocated
> unconditionally even userspace only requires a single queue device. This is
> unnecessary and will lead a very high order of page allocation when has a high
> possibility to fail. Solving this by creating a one queue net device when
> userspace only use one queue and also reduce MAX_TAP_QUEUES to
> DEFAULT_MAX_NUM_RSS_QUEUES which can guarantee the success of
> the allocation.
> 
> Reported-by: Dirk Hohndel <dirk@hohndel.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Note: this is a 3.8 patch, it fixes a regression.

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/tun.c |   15 ++++++++-------
>  1 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index c81680d..8939d21 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -109,11 +109,10 @@ struct tap_filter {
>  	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
>  };
>  
> -/* 1024 is probably a high enough limit: modern hypervisors seem to support on
> - * the order of 100-200 CPUs so this leaves us some breathing space if we want
> - * to match a queue per guest CPU.
> - */
> -#define MAX_TAP_QUEUES 1024
> +/* DEFAULT_MAX_NUM_RSS_QUEUES were choosed to let the rx/tx queues allocated for
> + * the netdevice to be fit in one page. So we can make sure the success of
> + * memory allocation. TODO: increase the limit. */
> +#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
>  
>  #define TUN_FLOW_EXPIRE (3 * HZ)
>  
> @@ -1583,6 +1582,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>  	else {
>  		char *name;
>  		unsigned long flags = 0;
> +		int queues = ifr->ifr_flags & IFF_MULTI_QUEUE ?
> +			     MAX_TAP_QUEUES : 1;
>  
>  		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
>  			return -EPERM;
> @@ -1606,8 +1607,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
>  			name = ifr->ifr_name;
>  
>  		dev = alloc_netdev_mqs(sizeof(struct tun_struct), name,
> -				       tun_setup,
> -				       MAX_TAP_QUEUES, MAX_TAP_QUEUES);
> +				       tun_setup, queues, queues);
> +
>  		if (!dev)
>  			return -ENOMEM;
>  
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] tuntap: reduce memory using of queues
  2013-01-23 13:59 [PATCH 1/2] tuntap: reduce memory using of queues Jason Wang
  2013-01-23 13:59 ` [PATCH 2/2] tuntap: limit the number of flow caches Jason Wang
  2013-01-23 15:16 ` [PATCH 1/2] tuntap: reduce memory using of queues Michael S. Tsirkin
@ 2013-01-23 18:47 ` David Miller
  2 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2013-01-23 18:47 UTC (permalink / raw)
  To: jasowang; +Cc: mst, netdev, linux-kernel, eric.dumazet, dwmw2

From: Jason Wang <jasowang@redhat.com>
Date: Wed, 23 Jan 2013 21:59:12 +0800

> A MAX_TAP_QUEUES(1024) queues of tuntap device is always allocated
> unconditionally even userspace only requires a single queue device. This is
> unnecessary and will lead a very high order of page allocation when has a high
> possibility to fail. Solving this by creating a one queue net device when
> userspace only use one queue and also reduce MAX_TAP_QUEUES to
> DEFAULT_MAX_NUM_RSS_QUEUES which can guarantee the success of
> the allocation.
> 
> Reported-by: Dirk Hohndel <dirk@hohndel.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Applied.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] tuntap: limit the number of flow caches
  2013-01-23 13:59 ` [PATCH 2/2] tuntap: limit the number of flow caches Jason Wang
@ 2013-01-23 18:50   ` David Miller
  2013-01-24  3:22     ` Jason Wang
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2013-01-23 18:50 UTC (permalink / raw)
  To: jasowang; +Cc: mst, netdev, linux-kernel, stephen

From: Jason Wang <jasowang@redhat.com>
Date: Wed, 23 Jan 2013 21:59:13 +0800

> We create new flow caches when a new flow is identified by tuntap, This may lead
> some issues:
> 
> - userspace may produce a huge amount of short live flows to exhaust host memory
> - the unlimited number of flow caches may produce a long list which increase the
>   time in the linear searching
> 
> Solve this by introducing a limit of total number of flow caches.
> 
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---

Applied, but really flow caches are an extremely broken concept especially
when external entities control the population of such caches.

We removed the routing cache of the ipv4 networking code exactly because
this kind of crap does not work at all.

Next you're going to have to add a delicately managed garbage
collection scheme for this tuntap flow cache, and that will be tuned
endlessly, when the real issue is that fundamentally this does not
work.

Instead, make the full lookup scale properly and use appropriate data
structures.  It won't be as fast as a simple hash table demux, but
it'll actually be immune to growth issues and DoS attacks and give
consistent and repeatable lookup performance regardless of traffic
patterns.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] tuntap: limit the number of flow caches
  2013-01-23 18:50   ` David Miller
@ 2013-01-24  3:22     ` Jason Wang
  0 siblings, 0 replies; 6+ messages in thread
From: Jason Wang @ 2013-01-24  3:22 UTC (permalink / raw)
  To: David Miller; +Cc: mst, netdev, linux-kernel, stephen

On 01/24/2013 02:50 AM, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Wed, 23 Jan 2013 21:59:13 +0800
>
>> We create new flow caches when a new flow is identified by tuntap, This may lead
>> some issues:
>>
>> - userspace may produce a huge amount of short live flows to exhaust host memory
>> - the unlimited number of flow caches may produce a long list which increase the
>>   time in the linear searching
>>
>> Solve this by introducing a limit of total number of flow caches.
>>
>> Cc: Stephen Hemminger <stephen@networkplumber.org>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
> Applied, but really flow caches are an extremely broken concept especially
> when external entities control the population of such caches.
>
> We removed the routing cache of the ipv4 networking code exactly because
> this kind of crap does not work at all.
>
> Next you're going to have to add a delicately managed garbage
> collection scheme for this tuntap flow cache, and that will be tuned
> endlessly, when the real issue is that fundamentally this does not
> work.
>
> Instead, make the full lookup scale properly and use appropriate data
> structures.  It won't be as fast as a simple hash table demux, but
> it'll actually be immune to growth issues and DoS attacks and give
> consistent and repeatable lookup performance regardless of traffic
> patterns.

Ok, I will rework it in 3.9.

Thanks
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-01-24  3:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-23 13:59 [PATCH 1/2] tuntap: reduce memory using of queues Jason Wang
2013-01-23 13:59 ` [PATCH 2/2] tuntap: limit the number of flow caches Jason Wang
2013-01-23 18:50   ` David Miller
2013-01-24  3:22     ` Jason Wang
2013-01-23 15:16 ` [PATCH 1/2] tuntap: reduce memory using of queues Michael S. Tsirkin
2013-01-23 18:47 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).