All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-net 0/4] Increase the limit of tuntap queues
@ 2014-11-18 16:22 Pankaj Gupta
  2014-11-18 16:22 ` [PATCH net-next 1/4] net: allow large number of rx queues Pankaj Gupta
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Pankaj Gupta @ 2014-11-18 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen, jiri,
	sergei.shtylyov, Pankaj Gupta

This patch series is followup to the RFC posted as:
https://lkml.org/lkml/2014/8/18/392

Changes from RFC are:
PATCH 1: Sergei Shtylyov - Add an empty line after declarations.
PATCH 2: Jiri Pirko - Do not introduce new module paramaters.
	 Michael.S.Tsirkin - We can use sysctl for limiting max number
                             of queues.

Networking under KVM works best if we allocate a per-vCPU rx and tx
queue in a virtual NIC. This requires a per-vCPU queue on the host side.
Modern physical NICs have multiqueue support for large number of queues.
To scale vNIC to run multiple queues parallel to maximum number of vCPU's
we need to increase number of queues support in tuntap.   

This series is to increase the limit of tuntap queues. Original work is being 
done by 'jasowang@redhat.com'. I am taking this 'https://lkml.org/lkml/2013/6/19/29' 
patch series as a reference. As per discussion in the patch series:

There were two reasons which prevented us from increasing number of tun queues:

- The netdev_queue array in netdevice were allocated through kmalloc, which may 
  cause a high order memory allocation too when we have several queues. 
  E.g. sizeof(netdev_queue) is 320, which means a high order allocation would 
  happens when the device has more than 16 queues.

- We store the hash buckets in tun_struct which results a very large size of
  tun_struct, this high order memory allocation fail easily when the memory is
  fragmented.

The patch 60877a32bce00041528576e6b8df5abe9251fa73 increases the number of tx 
queues. Memory allocation fallback to vzalloc() when kmalloc() fails.

This series tries to address following issues:

- Increase the number of netdev_queue queues for rx similarly its done for tx 
  queues by falling back to vzalloc() when memory allocation with kmalloc() fails.

- Switches to use flex array to implement the flow caches to avoid higher order 
  allocations.

- Accept maximum number of queues as sysctl param so that any user space 
  application like libvirt can use this value to limit number of queues. Also
  Administrators can specify maximum number of queues by updating this sysctl
  entry.

- Increase number of queues to 256, maximum number is equal to maximum number 
  of vCPUS allowed in a guest.

I have done some testing to find out any regression and with sample program
 which creates tun/tap for single queue / multiqueue device and it seems to be 
 working fine. I will also post the performance numbers.

  tuntap: Increase the number of queues in tun
  tuntap: Reduce the size of tun_struct by using flex array
  tuntap: Accept tuntap max queue length as sysctl entry
  net: allow large number of rx queues

 drivers/net/tun.c |   91 +++++++++++++++++++++++++++++++++++---------
 net/core/dev.c    |   19 ++++++---
 2 files changed, 86 insertions(+), 24 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next 1/4] net: allow large number of rx queues
  2014-11-18 16:22 [PATCH net-net 0/4] Increase the limit of tuntap queues Pankaj Gupta
@ 2014-11-18 16:22 ` Pankaj Gupta
  2014-11-18 20:29   ` Cong Wang
  2014-11-18 16:22 ` [PATCH net-next 2/4] tuntap: Accept tuntap maximum number of queues as sysctl Pankaj Gupta
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Pankaj Gupta @ 2014-11-18 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen, jiri,
	sergei.shtylyov, Pankaj Gupta

netif_alloc_rx_queues() uses kcalloc() to allocate memory
for "struct netdev_queue *_rx" array.
If we are doing large rx queue allocation kcalloc() might
fail, so this patch does a fallback to vzalloc().
Similar implementation is done for tx queue allocation in
netif_alloc_netdev_queues().

We avoid failure of high order memory allocation
with the help of vzalloc(), this allows us to do large
rx and tx queue allocation which in turn helps us to
increase the number of queues in tun.

As vmalloc() adds overhead on a critical network path,
__GFP_REPEAT flag is used with kzalloc() to do this fallback
only when really needed.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Gibson <dgibson@redhat.com>
---
 net/core/dev.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index e916ba8..abe9560 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6059,17 +6059,25 @@ void netif_stacked_transfer_operstate(const struct net_device *rootdev,
 EXPORT_SYMBOL(netif_stacked_transfer_operstate);
 
 #ifdef CONFIG_SYSFS
+static void netif_free_rx_queues(struct net_device *dev)
+{
+	kvfree(dev->_rx);
+}
+
 static int netif_alloc_rx_queues(struct net_device *dev)
 {
 	unsigned int i, count = dev->num_rx_queues;
 	struct netdev_rx_queue *rx;
+	size_t sz = count * sizeof(*rx);
 
 	BUG_ON(count < 1);
 
-	rx = kcalloc(count, sizeof(struct netdev_rx_queue), GFP_KERNEL);
-	if (!rx)
-		return -ENOMEM;
-
+	rx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
+	if (!rx) {
+		rx = vzalloc(sz);
+		if (!rx)
+			return -ENOMEM;
+	}
 	dev->_rx = rx;
 
 	for (i = 0; i < count; i++)
@@ -6698,9 +6706,8 @@ void free_netdev(struct net_device *dev)
 
 	netif_free_tx_queues(dev);
 #ifdef CONFIG_SYSFS
-	kfree(dev->_rx);
+	netif_free_rx_queues(dev);
 #endif
-
 	kfree(rcu_dereference_protected(dev->ingress_queue, 1));
 
 	/* Flush device addresses */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 2/4] tuntap: Accept tuntap maximum number of queues as sysctl
  2014-11-18 16:22 [PATCH net-net 0/4] Increase the limit of tuntap queues Pankaj Gupta
  2014-11-18 16:22 ` [PATCH net-next 1/4] net: allow large number of rx queues Pankaj Gupta
@ 2014-11-18 16:22 ` Pankaj Gupta
  2014-11-18 16:22 ` [PATCH net-next 3/4] tuntap: reduce the size of tun_struct by using flex array Pankaj Gupta
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Pankaj Gupta @ 2014-11-18 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen, jiri,
	sergei.shtylyov, Pankaj Gupta

 This patch accepts maximum number of tun/tap queues allocated as
 sysctl entry which a user space application like libvirt
 can make use of to limit maximum number of tuntap queues. 
 Value of sysctl entry is writable dynamically.
 
 If no value is set for sysctl entry 'net.tuntap.max_queues' 
 a default value 256 is used which is equal to maximum number 
 of vCPUS allowed by KVM.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/net/tun.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e3fa65a..b03a745 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -71,6 +71,7 @@
 #include <net/rtnetlink.h>
 #include <net/sock.h>
 #include <linux/seq_file.h>
+#include <linux/sysctl.h>
 #include <linux/uio.h>
 
 #include <asm/uaccess.h>
@@ -117,10 +118,16 @@ struct tap_filter {
  * the netdevice to be fit in one page. So we can make sure the success of
  * memory allocation. TODO: increase the limit. */
 #define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
+#define MIN_TAP_QUEUES 1
 #define MAX_TAP_FLOWS  4096
 
 #define TUN_FLOW_EXPIRE (3 * HZ)
 
+static struct ctl_table_header *tun_sysctl_header;
+static int tun_queues = MAX_TAP_QUEUES;
+static int min_queues = MIN_TAP_QUEUES;
+static int max_queues = MAX_TAP_QUEUES;
+
 /* A tun_file connects an open character device to a tuntap netdevice. It
  * also contains all socket related structures (except sock_fprog and tap_filter)
  * to serve as one transmit queue for tuntap device. The sock_fprog and
@@ -197,6 +204,19 @@ struct tun_struct {
 	u32 flow_count;
 };
 
+static struct ctl_table tun_ctl_table[] = {
+	{
+		.procname       = "tun_max_queues",
+		.data           = &tun_queues,
+		.maxlen         = sizeof(int),
+		.mode           = 0644,
+		.proc_handler   = proc_dointvec_minmax,
+		.extra1         = &min_queues,
+		.extra2         = &max_queues
+	},
+	{  }
+};
+
 static inline u32 tun_hashfn(u32 rxhash)
 {
 	return rxhash & 0x3ff;
@@ -547,7 +567,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file, bool skip_filte
 
 	err = -E2BIG;
 	if (!tfile->detached &&
-	    tun->numqueues + tun->numdisabled == MAX_TAP_QUEUES)
+	    tun->numqueues + tun->numdisabled == tun_queues)
 		goto out;
 
 	err = 0;
@@ -1624,7 +1644,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		char *name;
 		unsigned long flags = 0;
 		int queues = ifr->ifr_flags & IFF_MULTI_QUEUE ?
-			     MAX_TAP_QUEUES : 1;
+			     tun_queues : 1;
 
 		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
@@ -2335,6 +2355,13 @@ static int __init tun_init(void)
 	pr_info("%s, %s\n", DRV_DESCRIPTION, DRV_VERSION);
 	pr_info("%s\n", DRV_COPYRIGHT);
 
+	tun_sysctl_header = register_net_sysctl(&init_net, "net/tuntap",
+						tun_ctl_table);
+
+	if (!tun_sysctl_header)
+		pr_err("Can't register tun_ctl_table. Tun device queue"
+		       "setting to default value : %d queues.\n", tun_queues);
+
 	ret = rtnl_link_register(&tun_link_ops);
 	if (ret) {
 		pr_err("Can't register link_ops\n");
@@ -2357,6 +2384,8 @@ static void tun_cleanup(void)
 {
 	misc_deregister(&tun_miscdev);
 	rtnl_link_unregister(&tun_link_ops);
+	if (tun_sysctl_header)
+		unregister_net_sysctl_table(tun_sysctl_header);
 }
 
 /* Get an underlying socket object from tun file.  Returns error unless file is
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 3/4] tuntap: reduce the size of tun_struct by  using flex array.
  2014-11-18 16:22 [PATCH net-net 0/4] Increase the limit of tuntap queues Pankaj Gupta
  2014-11-18 16:22 ` [PATCH net-next 1/4] net: allow large number of rx queues Pankaj Gupta
  2014-11-18 16:22 ` [PATCH net-next 2/4] tuntap: Accept tuntap maximum number of queues as sysctl Pankaj Gupta
@ 2014-11-18 16:22 ` Pankaj Gupta
  2014-11-18 16:22 ` [PATCH net-next 4/4] tuntap: Increase the number of queues in tun Pankaj Gupta
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Pankaj Gupta @ 2014-11-18 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen, jiri,
	sergei.shtylyov, Pankaj Gupta

This patch switches to flex array to implement the flow caches, it brings
several advantages:

- Reduce the size of the tun_struct structure, which allows us to increase the
  upper limit of queues in future.
- Avoid higher order memory allocation. It will be useful when switching to
  pure hashing in flow cache which may demand a larger size array in future.

After this patch, the size of tun_struct on x86_64 reduced from 8512 to
328

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: David Gibson <dgibson@redhat.com>
---
 drivers/net/tun.c | 49 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 37 insertions(+), 12 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e3fa65a..bd07a6d 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -65,6 +65,7 @@
 #include <linux/nsproxy.h>
 #include <linux/virtio_net.h>
 #include <linux/rcupdate.h>
+#include <linux/flex_array.h>
 #include <net/ipv6.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
@@ -188,7 +189,7 @@ struct tun_struct {
 	int debug;
 #endif
 	spinlock_t lock;
-	struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
+	struct flex_array *flows;
 	struct timer_list flow_gc_timer;
 	unsigned long ageing_time;
 	unsigned int numdisabled;
@@ -249,10 +250,11 @@ static void tun_flow_flush(struct tun_struct *tun)
 
 	spin_lock_bh(&tun->lock);
 	for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
+		struct hlist_head *h = flex_array_get(tun->flows, i);
 		struct tun_flow_entry *e;
 		struct hlist_node *n;
 
-		hlist_for_each_entry_safe(e, n, &tun->flows[i], hash_link)
+		hlist_for_each_entry_safe(e, n, h, hash_link)
 			tun_flow_delete(tun, e);
 	}
 	spin_unlock_bh(&tun->lock);
@@ -264,10 +266,11 @@ static void tun_flow_delete_by_queue(struct tun_struct *tun, u16 queue_index)
 
 	spin_lock_bh(&tun->lock);
 	for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
+		struct hlist_head *h = flex_array_get(tun->flows, i);
 		struct tun_flow_entry *e;
 		struct hlist_node *n;
 
-		hlist_for_each_entry_safe(e, n, &tun->flows[i], hash_link) {
+		hlist_for_each_entry_safe(e, n, h, hash_link) {
 			if (e->queue_index == queue_index)
 				tun_flow_delete(tun, e);
 		}
@@ -287,10 +290,11 @@ static void tun_flow_cleanup(unsigned long data)
 
 	spin_lock_bh(&tun->lock);
 	for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
+		struct hlist_head *h = flex_array_get(tun->flows, i);
 		struct tun_flow_entry *e;
 		struct hlist_node *n;
 
-		hlist_for_each_entry_safe(e, n, &tun->flows[i], hash_link) {
+		hlist_for_each_entry_safe(e, n, h, hash_link) {
 			unsigned long this_timer;
 			count++;
 			this_timer = e->updated + delay;
@@ -317,7 +321,7 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
 	if (!rxhash)
 		return;
 	else
-		head = &tun->flows[tun_hashfn(rxhash)];
+		head = flex_array_get(tun->flows, tun_hashfn(rxhash));
 
 	rcu_read_lock();
 
@@ -380,7 +384,8 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb,
 
 	txq = skb_get_hash(skb);
 	if (txq) {
-		e = tun_flow_find(&tun->flows[tun_hashfn(txq)], txq);
+		e = tun_flow_find(flex_array_get(tun->flows,
+						 tun_hashfn(txq)), txq);
 		if (e) {
 			tun_flow_save_rps_rxhash(e, txq);
 			txq = e->queue_index;
@@ -760,8 +765,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 		rxhash = skb_get_hash(skb);
 		if (rxhash) {
 			struct tun_flow_entry *e;
-			e = tun_flow_find(&tun->flows[tun_hashfn(rxhash)],
-					rxhash);
+			e = tun_flow_find(flex_array_get(tun->flows,
+							 tun_hashfn(rxhash)), rxhash);
 			if (e)
 				tun_flow_save_rps_rxhash(e, rxhash);
 		}
@@ -896,23 +901,40 @@ static const struct net_device_ops tap_netdev_ops = {
 #endif
 };
 
-static void tun_flow_init(struct tun_struct *tun)
+static int tun_flow_init(struct tun_struct *tun)
 {
-	int i;
+	struct flex_array *buckets;
+	int i, err;
+
+	buckets = flex_array_alloc(sizeof(struct hlist_head),
+				   TUN_NUM_FLOW_ENTRIES, GFP_KERNEL);
+	if (!buckets)
+		return -ENOMEM;
+
+	err = flex_array_prealloc(buckets, 0, TUN_NUM_FLOW_ENTRIES, GFP_KERNEL);
+	if (err) {
+		flex_array_free(buckets);
+		return -ENOMEM;
+	}
 
+	tun->flows = buckets;
 	for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++)
-		INIT_HLIST_HEAD(&tun->flows[i]);
+		INIT_HLIST_HEAD((struct hlist_head *)
+				flex_array_get(buckets, i));
 
 	tun->ageing_time = TUN_FLOW_EXPIRE;
 	setup_timer(&tun->flow_gc_timer, tun_flow_cleanup, (unsigned long)tun);
 	mod_timer(&tun->flow_gc_timer,
 		  round_jiffies_up(jiffies + tun->ageing_time));
+
+	return 0;
 }
 
 static void tun_flow_uninit(struct tun_struct *tun)
 {
 	del_timer_sync(&tun->flow_gc_timer);
 	tun_flow_flush(tun);
+	flex_array_free(tun->flows);
 }
 
 /* Initialize net device. */
@@ -1674,7 +1696,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 			goto err_free_dev;
 
 		tun_net_init(dev);
-		tun_flow_init(tun);
+
+		err = tun_flow_init(tun);
+		if (err < 0)
+			goto err_free_dev;
 
 		dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
 				   TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX |
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 4/4] tuntap: Increase the number of queues in tun.
  2014-11-18 16:22 [PATCH net-net 0/4] Increase the limit of tuntap queues Pankaj Gupta
                   ` (2 preceding siblings ...)
  2014-11-18 16:22 ` [PATCH net-next 3/4] tuntap: reduce the size of tun_struct by using flex array Pankaj Gupta
@ 2014-11-18 16:22 ` Pankaj Gupta
  2014-11-19  1:43 ` [PATCH net-net 0/4] Increase the limit of tuntap queues Alexei Starovoitov
  2014-11-19 20:16 ` David Miller
  5 siblings, 0 replies; 19+ messages in thread
From: Pankaj Gupta @ 2014-11-18 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen, jiri,
	sergei.shtylyov, Pankaj Gupta

Networking under kvm works best if we allocate a per-vCPU RX and TX
queue in a virtual NIC. This requires a per-vCPU queue on the host side.

It is now safe to increase the maximum number of queues.
Preceding patches:
        net: allow large number of rx queues
        tuntap: Reduce the size of tun_struct by using flex array
        tuntap: Accepts tuntap max queue length as net sysctl parameter

        made sure this won't cause failures due to high order memory
allocations. Increase it to 256: this is the max number of vCPUs
KVM supports.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: David Gibson <dgibson@redhat.com>
---
 drivers/net/tun.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index a9b3eb4..2fb31b7 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -115,10 +115,11 @@ struct tap_filter {
 	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
 };
 
-/* DEFAULT_MAX_NUM_RSS_QUEUES were chosen to let the rx/tx queues allocated for
- * the netdevice to be fit in one page. So we can make sure the success of
- * memory allocation. TODO: increase the limit. */
-#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
+/* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
+ * to max number of vCPUS in guest. Also, we are making sure here
+ * queue memory allocation do not fail.
+ */
+#define MAX_TAP_QUEUES 256
 #define MIN_TAP_QUEUES 1
 #define MAX_TAP_FLOWS  4096
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 1/4] net: allow large number of rx queues
  2014-11-18 16:22 ` [PATCH net-next 1/4] net: allow large number of rx queues Pankaj Gupta
@ 2014-11-18 20:29   ` Cong Wang
  2014-11-20 16:31     ` Pankaj Gupta
  0 siblings, 1 reply; 19+ messages in thread
From: Cong Wang @ 2014-11-18 20:29 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, netdev, David Miller, Jason Wang,
	Michael S. Tsirkin, dgibson, vfalico, Eric Dumazet,
	Vladislav Yasevich, Jerry Chu, wuzhy, Pavel Emelyanov,
	Tom Herbert, bhutchings, xii, Stephen Hemminger,
	Jiří Pírko, sergei.shtylyov

On Tue, Nov 18, 2014 at 8:22 AM, Pankaj Gupta <pagupta@redhat.com> wrote:
>
> As vmalloc() adds overhead on a critical network path,
> __GFP_REPEAT flag is used with kzalloc() to do this fallback
> only when really needed.
>

Are you sure we need __GFP_REPEAT? We have vmalloc() as
fallback of kmalloc() in many places of networking (grep kvfree),
none of those I checked has this flag set.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-net 0/4] Increase the limit of tuntap queues
  2014-11-18 16:22 [PATCH net-net 0/4] Increase the limit of tuntap queues Pankaj Gupta
                   ` (3 preceding siblings ...)
  2014-11-18 16:22 ` [PATCH net-next 4/4] tuntap: Increase the number of queues in tun Pankaj Gupta
@ 2014-11-19  1:43 ` Alexei Starovoitov
  2014-11-19 20:16 ` David Miller
  5 siblings, 0 replies; 19+ messages in thread
From: Alexei Starovoitov @ 2014-11-19  1:43 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, netdev, David S. Miller, Jason Wang, mst, dgibson,
	Veaceslav Falico, Eric Dumazet, Vladislav Yasevich, Jerry Chu,
	wuzhy, Pavel Emelianov, Tom Herbert, bhutchings, xii,
	Stephen Hemminger, Jiří Pírko, sergei.shtylyov

On Tue, Nov 18, 2014 at 8:22 AM, Pankaj Gupta <pagupta@redhat.com> wrote:
> Networking under KVM works best if we allocate a per-vCPU rx and tx
> queue in a virtual NIC. This requires a per-vCPU queue on the host side.
...
> I have done some testing to find out any regression and with sample program
>  which creates tun/tap for single queue / multiqueue device and it seems to be
>  working fine. I will also post the performance numbers.

Sounds quite exciting.
please share recommended setup (queue assignments) and
performance gains you're seeing.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-net 0/4] Increase the limit of tuntap queues
  2014-11-18 16:22 [PATCH net-net 0/4] Increase the limit of tuntap queues Pankaj Gupta
                   ` (4 preceding siblings ...)
  2014-11-19  1:43 ` [PATCH net-net 0/4] Increase the limit of tuntap queues Alexei Starovoitov
@ 2014-11-19 20:16 ` David Miller
  2014-11-19 20:44   ` Michael S. Tsirkin
  5 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2014-11-19 20:16 UTC (permalink / raw)
  To: pagupta
  Cc: linux-kernel, netdev, jasowang, mst, dgibson, vfalico, edumazet,
	vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings, xii,
	stephen, jiri, sergei.shtylyov

From: Pankaj Gupta <pagupta@redhat.com>
Date: Tue, 18 Nov 2014 21:52:54 +0530

> - Accept maximum number of queues as sysctl param so that any user space 
>   application like libvirt can use this value to limit number of queues. Also
>   Administrators can specify maximum number of queues by updating this sysctl
>   entry.

This is the only part I don't like.

Just let whoever has privileges to configure the tun device shoot
themselves in the foot if they want to by configuring "too many"
queues.

If the virtual entity runs itself out of resources by doing something
stupid, it's purely their problem.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-net 0/4] Increase the limit of tuntap queues
  2014-11-19 20:16 ` David Miller
@ 2014-11-19 20:44   ` Michael S. Tsirkin
  2014-11-23  5:22     ` Pankaj Gupta
  2014-11-23 10:46     ` Michael S. Tsirkin
  0 siblings, 2 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2014-11-19 20:44 UTC (permalink / raw)
  To: David Miller
  Cc: pagupta, linux-kernel, netdev, jasowang, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei.shtylyov

On Wed, Nov 19, 2014 at 03:16:28PM -0500, David Miller wrote:
> From: Pankaj Gupta <pagupta@redhat.com>
> Date: Tue, 18 Nov 2014 21:52:54 +0530
> 
> > - Accept maximum number of queues as sysctl param so that any user space 
> >   application like libvirt can use this value to limit number of queues. Also
> >   Administrators can specify maximum number of queues by updating this sysctl
> >   entry.
> 
> This is the only part I don't like.
> 
> Just let whoever has privileges to configure the tun device shoot
> themselves in the foot if they want to by configuring "too many"
> queues.
> 
> If the virtual entity runs itself out of resources by doing something
> stupid, it's purely their problem.

Well it will run host out of kernel, no?

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 1/4] net: allow large number of rx queues
  2014-11-18 20:29   ` Cong Wang
@ 2014-11-20 16:31     ` Pankaj Gupta
  0 siblings, 0 replies; 19+ messages in thread
From: Pankaj Gupta @ 2014-11-20 16:31 UTC (permalink / raw)
  To: Cong Wang
  Cc: linux-kernel, netdev, David Miller, Jason Wang,
	Michael S. Tsirkin, dgibson, vfalico, Eric Dumazet,
	Vladislav Yasevich, Jerry Chu, wuzhy, Pavel Emelyanov,
	Tom Herbert, bhutchings, xii, Stephen Hemminger,
	Jiří Pírko, sergei shtylyov


> On Tue, Nov 18, 2014 at 8:22 AM, Pankaj Gupta <pagupta@redhat.com> wrote:
> >
> > As vmalloc() adds overhead on a critical network path,
> > __GFP_REPEAT flag is used with kzalloc() to do this fallback
> > only when really needed.
> >
> 
> Are you sure we need __GFP_REPEAT? We have vmalloc() as
> fallback of kmalloc() in many places of networking (grep kvfree),
> none of those I checked has this flag set.

Its there in netif_alloc_netdev_queues() function in same file.
Also, I found it some other places as well.
I think its good to have.
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-net 0/4] Increase the limit of tuntap queues
  2014-11-19 20:44   ` Michael S. Tsirkin
@ 2014-11-23  5:22     ` Pankaj Gupta
  2014-11-23 10:46     ` Michael S. Tsirkin
  1 sibling, 0 replies; 19+ messages in thread
From: Pankaj Gupta @ 2014-11-23  5:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, David Miller
  Cc: linux-kernel, netdev, jasowang, dgibson, vfalico, edumazet,
	vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings, xii,
	stephen, jiri, sergei shtylyov


> On Wed, Nov 19, 2014 at 03:16:28PM -0500, David Miller wrote:
> > From: Pankaj Gupta <pagupta@redhat.com>
> > Date: Tue, 18 Nov 2014 21:52:54 +0530
> > 
> > > - Accept maximum number of queues as sysctl param so that any user space
> > >   application like libvirt can use this value to limit number of queues.
> > >   Also
> > >   Administrators can specify maximum number of queues by updating this
> > >   sysctl
> > >   entry.
> > 
> > This is the only part I don't like.
> > 
> > Just let whoever has privileges to configure the tun device shoot
> > themselves in the foot if they want to by configuring "too many"
> > queues.
> > 
> > If the virtual entity runs itself out of resources by doing something
> > stupid, it's purely their problem.

We can configure some fixed number of queues like 16 or so at the start 
if multiqueue is enabled and then if somebody needs more then that, they
can do on their own.

If we dont want anyone to directly increase the number of queues, we can add
an ioctl command for increasing number of queues?



Suggestions here. 
> 
> Well it will run host out of kernel, no?
> 
> --
> MST
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-net 0/4] Increase the limit of tuntap queues
  2014-11-19 20:44   ` Michael S. Tsirkin
  2014-11-23  5:22     ` Pankaj Gupta
@ 2014-11-23 10:46     ` Michael S. Tsirkin
  2014-11-23 18:43       ` David Miller
  2014-11-24  3:23       ` Jason Wang
  1 sibling, 2 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2014-11-23 10:46 UTC (permalink / raw)
  To: David Miller
  Cc: pagupta, linux-kernel, netdev, jasowang, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei.shtylyov

On Wed, Nov 19, 2014 at 10:44:27PM +0200, Michael S. Tsirkin wrote:
> On Wed, Nov 19, 2014 at 03:16:28PM -0500, David Miller wrote:
> > From: Pankaj Gupta <pagupta@redhat.com>
> > Date: Tue, 18 Nov 2014 21:52:54 +0530
> > 
> > > - Accept maximum number of queues as sysctl param so that any user space 
> > >   application like libvirt can use this value to limit number of queues. Also
> > >   Administrators can specify maximum number of queues by updating this sysctl
> > >   entry.
> > 
> > This is the only part I don't like.
> > 
> > Just let whoever has privileges to configure the tun device shoot
> > themselves in the foot if they want to by configuring "too many"
> > queues.
> > 
> > If the virtual entity runs itself out of resources by doing something
> > stupid, it's purely their problem.
> 
> Well it will run host out of kernel, no?


To clarify:

At the moment attaching/detaching queues is an unpriveledged operation.

Shouldn't we worry that an application can cause large
allocations, and provide a way to limit these?

David, could you comment on this please?

> -- 
> MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-net 0/4] Increase the limit of tuntap queues
  2014-11-23 10:46     ` Michael S. Tsirkin
@ 2014-11-23 18:43       ` David Miller
  2014-11-23 20:30         ` Michael S. Tsirkin
  2014-11-24  3:23       ` Jason Wang
  1 sibling, 1 reply; 19+ messages in thread
From: David Miller @ 2014-11-23 18:43 UTC (permalink / raw)
  To: mst
  Cc: pagupta, linux-kernel, netdev, jasowang, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei.shtylyov

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Sun, 23 Nov 2014 12:46:23 +0200

> At the moment attaching/detaching queues is an unpriveledged operation.
> 
> Shouldn't we worry that an application can cause large
> allocations, and provide a way to limit these?
> 
> David, could you comment on this please?

I don't want arbitrary limits imposed.

Where does this "application" run?  If it's in the host, then who
cares?  If they suck up all of their available memory with queue
resources, it's their problem.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-net 0/4] Increase the limit of tuntap queues
  2014-11-23 18:43       ` David Miller
@ 2014-11-23 20:30         ` Michael S. Tsirkin
  2014-11-24  1:23           ` David Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2014-11-23 20:30 UTC (permalink / raw)
  To: David Miller
  Cc: pagupta, linux-kernel, netdev, jasowang, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei.shtylyov

On Sun, Nov 23, 2014 at 01:43:23PM -0500, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Sun, 23 Nov 2014 12:46:23 +0200
> 
> > At the moment attaching/detaching queues is an unpriveledged operation.
> > 
> > Shouldn't we worry that an application can cause large
> > allocations, and provide a way to limit these?
> > 
> > David, could you comment on this please?
> 
> I don't want arbitrary limits imposed.
> 
> Where does this "application" run?  If it's in the host, then who
> cares?  If they suck up all of their available memory with queue
> resources, it's their problem.

qemu runs in the host, but it's unpriveledged: it gets
passed tun FDs by a priveledged daemon, and it only
has the rights to some operations,
in particular to attach and detach queues.

The assumption always was that this operation is safe
and can't make kernel run out of resources.


-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-net 0/4] Increase the limit of tuntap queues
  2014-11-23 20:30         ` Michael S. Tsirkin
@ 2014-11-24  1:23           ` David Miller
  2014-11-24  8:02             ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2014-11-24  1:23 UTC (permalink / raw)
  To: mst
  Cc: pagupta, linux-kernel, netdev, jasowang, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei.shtylyov

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Sun, 23 Nov 2014 22:30:32 +0200

> qemu runs in the host, but it's unpriveledged: it gets
> passed tun FDs by a priveledged daemon, and it only
> has the rights to some operations,
> in particular to attach and detach queues.
> 
> The assumption always was that this operation is safe
> and can't make kernel run out of resources.

This creates a rather rediculous situation in my opinion.

Configuring a network device is a privileged operation, the daemon
should be setting this thing up.

In no other context would we have to worry about something like this.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-net 0/4] Increase the limit of tuntap queues
  2014-11-23 10:46     ` Michael S. Tsirkin
  2014-11-23 18:43       ` David Miller
@ 2014-11-24  3:23       ` Jason Wang
  2014-11-24  7:55         ` Michael S. Tsirkin
  1 sibling, 1 reply; 19+ messages in thread
From: Jason Wang @ 2014-11-24  3:23 UTC (permalink / raw)
  To: Michael S. Tsirkin, David Miller
  Cc: pagupta, linux-kernel, netdev, dgibson, vfalico, edumazet,
	vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings, xii,
	stephen, jiri, sergei.shtylyov

On 11/23/2014 06:46 PM, Michael S. Tsirkin wrote:
> On Wed, Nov 19, 2014 at 10:44:27PM +0200, Michael S. Tsirkin wrote:
>> > On Wed, Nov 19, 2014 at 03:16:28PM -0500, David Miller wrote:
>>> > > From: Pankaj Gupta <pagupta@redhat.com>
>>> > > Date: Tue, 18 Nov 2014 21:52:54 +0530
>>> > > 
>>>> > > > - Accept maximum number of queues as sysctl param so that any user space 
>>>> > > >   application like libvirt can use this value to limit number of queues. Also
>>>> > > >   Administrators can specify maximum number of queues by updating this sysctl
>>>> > > >   entry.
>>> > > 
>>> > > This is the only part I don't like.
>>> > > 
>>> > > Just let whoever has privileges to configure the tun device shoot
>>> > > themselves in the foot if they want to by configuring "too many"
>>> > > queues.
>>> > > 
>>> > > If the virtual entity runs itself out of resources by doing something
>>> > > stupid, it's purely their problem.
>> > 
>> > Well it will run host out of kernel, no?
> To clarify:
>
> At the moment attaching/detaching queues is an unpriveledged operation.
>
> Shouldn't we worry that an application can cause large
> allocations, and provide a way to limit these?

But creating new queues (TUNSETIFF) is privileged. There's no way for
unprivileged user to allocate more resources. So we are safe here?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-net 0/4] Increase the limit of tuntap queues
  2014-11-24  3:23       ` Jason Wang
@ 2014-11-24  7:55         ` Michael S. Tsirkin
  0 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2014-11-24  7:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: David Miller, pagupta, linux-kernel, netdev, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei.shtylyov

On Mon, Nov 24, 2014 at 11:23:05AM +0800, Jason Wang wrote:
> On 11/23/2014 06:46 PM, Michael S. Tsirkin wrote:
> > On Wed, Nov 19, 2014 at 10:44:27PM +0200, Michael S. Tsirkin wrote:
> >> > On Wed, Nov 19, 2014 at 03:16:28PM -0500, David Miller wrote:
> >>> > > From: Pankaj Gupta <pagupta@redhat.com>
> >>> > > Date: Tue, 18 Nov 2014 21:52:54 +0530
> >>> > > 
> >>>> > > > - Accept maximum number of queues as sysctl param so that any user space 
> >>>> > > >   application like libvirt can use this value to limit number of queues. Also
> >>>> > > >   Administrators can specify maximum number of queues by updating this sysctl
> >>>> > > >   entry.
> >>> > > 
> >>> > > This is the only part I don't like.
> >>> > > 
> >>> > > Just let whoever has privileges to configure the tun device shoot
> >>> > > themselves in the foot if they want to by configuring "too many"
> >>> > > queues.
> >>> > > 
> >>> > > If the virtual entity runs itself out of resources by doing something
> >>> > > stupid, it's purely their problem.
> >> > 
> >> > Well it will run host out of kernel, no?
> > To clarify:
> >
> > At the moment attaching/detaching queues is an unpriveledged operation.
> >
> > Shouldn't we worry that an application can cause large
> > allocations, and provide a way to limit these?
> 
> But creating new queues (TUNSETIFF) is privileged. There's no way for
> unprivileged user to allocate more resources. So we are safe here?

Hmm, that's true, I think I was confused.
Thanks for setting me straight.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-net 0/4] Increase the limit of tuntap queues
  2014-11-24  1:23           ` David Miller
@ 2014-11-24  8:02             ` Michael S. Tsirkin
  2014-11-24 14:28               ` Pankaj Gupta
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2014-11-24  8:02 UTC (permalink / raw)
  To: David Miller
  Cc: pagupta, linux-kernel, netdev, jasowang, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei.shtylyov

On Sun, Nov 23, 2014 at 08:23:21PM -0500, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Sun, 23 Nov 2014 22:30:32 +0200
> 
> > qemu runs in the host, but it's unpriveledged: it gets
> > passed tun FDs by a priveledged daemon, and it only
> > has the rights to some operations,
> > in particular to attach and detach queues.
> > 
> > The assumption always was that this operation is safe
> > and can't make kernel run out of resources.
> 
> This creates a rather rediculous situation in my opinion.
> 
> Configuring a network device is a privileged operation, the daemon
> should be setting this thing up.
> 
> In no other context would we have to worry about something like this.

Right.  Jason corrected me.  I got it wrong:
what qemu does is TUNSETQUEUE and that needs to get a queue
that's already initialized by the daemon.

To create new queues daemon calls TUNSETIFF,
and that already can be used to create new devices,
so it's a priveledged operation.

This means it's safe to just drop the restriction,
exactly as you suggested originally.
-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-net 0/4] Increase the limit of tuntap queues
  2014-11-24  8:02             ` Michael S. Tsirkin
@ 2014-11-24 14:28               ` Pankaj Gupta
  0 siblings, 0 replies; 19+ messages in thread
From: Pankaj Gupta @ 2014-11-24 14:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, linux-kernel, netdev, jasowang, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei shtylyov


> On Sun, Nov 23, 2014 at 08:23:21PM -0500, David Miller wrote:
> > From: "Michael S. Tsirkin" <mst@redhat.com>
> > Date: Sun, 23 Nov 2014 22:30:32 +0200
> > 
> > > qemu runs in the host, but it's unpriveledged: it gets
> > > passed tun FDs by a priveledged daemon, and it only
> > > has the rights to some operations,
> > > in particular to attach and detach queues.
> > > 
> > > The assumption always was that this operation is safe
> > > and can't make kernel run out of resources.
> > 
> > This creates a rather rediculous situation in my opinion.
> > 
> > Configuring a network device is a privileged operation, the daemon
> > should be setting this thing up.
> > 
> > In no other context would we have to worry about something like this.
> 
> Right.  Jason corrected me.  I got it wrong:
> what qemu does is TUNSETQUEUE and that needs to get a queue
> that's already initialized by the daemon.
> 
> To create new queues daemon calls TUNSETIFF,
> and that already can be used to create new devices,
> so it's a priveledged operation.
> 
> This means it's safe to just drop the restriction,
> exactly as you suggested originally.

I will drop patch2 to add sysctl entry and and will send a v2 with other 
patches.

Thanks,
Pankaj

> --
> MST
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-11-24 14:28 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-18 16:22 [PATCH net-net 0/4] Increase the limit of tuntap queues Pankaj Gupta
2014-11-18 16:22 ` [PATCH net-next 1/4] net: allow large number of rx queues Pankaj Gupta
2014-11-18 20:29   ` Cong Wang
2014-11-20 16:31     ` Pankaj Gupta
2014-11-18 16:22 ` [PATCH net-next 2/4] tuntap: Accept tuntap maximum number of queues as sysctl Pankaj Gupta
2014-11-18 16:22 ` [PATCH net-next 3/4] tuntap: reduce the size of tun_struct by using flex array Pankaj Gupta
2014-11-18 16:22 ` [PATCH net-next 4/4] tuntap: Increase the number of queues in tun Pankaj Gupta
2014-11-19  1:43 ` [PATCH net-net 0/4] Increase the limit of tuntap queues Alexei Starovoitov
2014-11-19 20:16 ` David Miller
2014-11-19 20:44   ` Michael S. Tsirkin
2014-11-23  5:22     ` Pankaj Gupta
2014-11-23 10:46     ` Michael S. Tsirkin
2014-11-23 18:43       ` David Miller
2014-11-23 20:30         ` Michael S. Tsirkin
2014-11-24  1:23           ` David Miller
2014-11-24  8:02             ` Michael S. Tsirkin
2014-11-24 14:28               ` Pankaj Gupta
2014-11-24  3:23       ` Jason Wang
2014-11-24  7:55         ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.