linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/4] Increase the limit of tuntap queues
@ 2014-08-18 13:37 Pankaj Gupta
  2014-08-18 13:37 ` [RFC 1/4] net: allow large number of rx queues Pankaj Gupta
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Pankaj Gupta @ 2014-08-18 13:37 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen,
	Pankaj Gupta

Networking under KVM works best if we allocate a per-vCPU rx and tx
queue in a virtual NIC. This requires a per-vCPU queue on the host side.
Modern physical NICs have multiqueue support for large number of queues.
To scale vNIC to run multiple queues parallel to maximum number of vCPU's
we need to increase number of queues support in tuntap.   

This series is to increase the limit of tuntap queues. Original work is being 
done by 'jasowang@redhat.com'. I am taking this 'https://lkml.org/lkml/2013/6/19/29' 
patch series as a reference. As per discussion in the patch series:

There were two reasons which prevented us from increasing number of tun queues:

- The netdev_queue array in netdevice were allocated through kmalloc, which may 
  cause a high order memory allocation too when we have several queues. 
  E.g. sizeof(netdev_queue) is 320, which means a high order allocation would 
  happens when the device has more than 16 queues.

- We store the hash buckets in tun_struct which results a very large size of
  tun_struct, this high order memory allocation fail easily when the memory is
  fragmented.

The patch 60877a32bce00041528576e6b8df5abe9251fa73 increases the number of tx 
queues. Memory allocation fallback to vzalloc() when kmalloc() fails.

This series tries to address following issues:

- Increase the number of netdev_queue queues for rx similarly its done for tx 
  queues by falling back to vzalloc() when memory allocation with kmalloc() fails.

- Switches to use flex array to implement the flow caches to avoid higher order 
  allocations.

- Publish maximum number of queues as read only module_param so that user space 
  application like libvirt can use this value to limit number of queues. Also
  Administrators can specify number of queues at module load time.

- Increase number of queues to 256, maximum number is equal to maximum number 
  of vCPUS allowed in a guest.

I have done some testing to find out any regression and with sample program
 which creates tun/tap for single queue / multiqueue device and it seems to be 
 working fine.

  tuntap: Increase the number of queues in tun
  tuntap: Reduce the size of tun_struct by using flex array
  tuntap: Publish tuntap max queue length as module_param
  net: allow large number of rx queues

 drivers/net/tun.c |   71 ++++++++++++++++++++++++++++++++++++++++--------------
 net/core/dev.c    |   20 +++++++++------
 2 files changed, 66 insertions(+), 25 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC 1/4] net: allow large number of rx queues
  2014-08-18 13:37 [RFC 0/4] Increase the limit of tuntap queues Pankaj Gupta
@ 2014-08-18 13:37 ` Pankaj Gupta
  2014-08-18 17:43   ` Sergei Shtylyov
  2014-08-18 13:37 ` [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param Pankaj Gupta
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Pankaj Gupta @ 2014-08-18 13:37 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen,
	Pankaj Gupta

netif_alloc_rx_queues() uses kcalloc() to allocate memory
for "struct netdev_queue *_rx" array.
If we are doing large rx queue allocation kcalloc() might
fail, so this patch does a fallback to vzalloc().
Similar implementation is done for tx queue allocation in
netif_alloc_netdev_queues().

We avoid failure of high order memory allocation
with the help of vzalloc(), this allows us to do large
rx and tx queue allocation which in turn helps us to
increase the number of queues in tun.

As vmalloc() adds overhead on a critical network path,
__GFP_REPEAT flag is used with kzalloc() to do this fallback
only when really needed.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Gibson <dgibson@redhat.com>
---
 net/core/dev.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1c15b18..a455a02 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5942,17 +5942,24 @@ void netif_stacked_transfer_operstate(const struct net_device *rootdev,
 EXPORT_SYMBOL(netif_stacked_transfer_operstate);
 
 #ifdef CONFIG_SYSFS
+static void netif_free_rx_queues(struct net_device *dev)
+{
+	kvfree(dev->_rx);
+}
+
 static int netif_alloc_rx_queues(struct net_device *dev)
 {
 	unsigned int i, count = dev->num_rx_queues;
 	struct netdev_rx_queue *rx;
-
+	size_t sz = count * sizeof(*rx);
 	BUG_ON(count < 1);
 
-	rx = kcalloc(count, sizeof(struct netdev_rx_queue), GFP_KERNEL);
-	if (!rx)
-		return -ENOMEM;
-
+	rx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
+	if (!rx) {
+		rx = vzalloc(sz);
+		if (!rx)
+			return -ENOMEM;
+	}
 	dev->_rx = rx;
 
 	for (i = 0; i < count; i++)
@@ -6581,9 +6588,8 @@ void free_netdev(struct net_device *dev)
 
 	netif_free_tx_queues(dev);
 #ifdef CONFIG_SYSFS
-	kfree(dev->_rx);
+	netif_free_rx_queues(dev);
 #endif
-
 	kfree(rcu_dereference_protected(dev->ingress_queue, 1));
 
 	/* Flush device addresses */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param
  2014-08-18 13:37 [RFC 0/4] Increase the limit of tuntap queues Pankaj Gupta
  2014-08-18 13:37 ` [RFC 1/4] net: allow large number of rx queues Pankaj Gupta
@ 2014-08-18 13:37 ` Pankaj Gupta
  2014-08-20 10:58   ` Jiri Pirko
  2014-08-18 13:37 ` [RFC 3/4] tuntap: reduce the size of tun_struct by using flex array Pankaj Gupta
  2014-08-18 13:37 ` [RFC 4/4] tuntap: Increase the number of queues in tun Pankaj Gupta
  3 siblings, 1 reply; 19+ messages in thread
From: Pankaj Gupta @ 2014-08-18 13:37 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen,
	Pankaj Gupta

 This patch publishes maximum number of tun/tap queues allocated as a
 read_only module parameter which a user space application like libvirt
 can make use of to limit maximum number of queues. Value of read_only
 module parameter can be writable only at module load time. If no value is set
 at module load time a default value 256 is used which is equal to maximum number
 of vCPUS allowed by KVM.

 Administrator can specify maximum number of queues only at the driver
 module load time.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/net/tun.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index acaaf67..1f518e2 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -119,6 +119,9 @@ struct tap_filter {
 
 #define TUN_FLOW_EXPIRE (3 * HZ)
 
+static int max_tap_queues = MAX_TAP_QUEUES;
+module_param(max_tap_queues, int, S_IRUGO);
+
 /* A tun_file connects an open character device to a tuntap netdevice. It
  * also contains all socket related structures (except sock_fprog and tap_filter)
  * to serve as one transmit queue for tuntap device. The sock_fprog and
@@ -545,7 +548,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file, bool skip_filte
 
 	err = -E2BIG;
 	if (!tfile->detached &&
-	    tun->numqueues + tun->numdisabled == MAX_TAP_QUEUES)
+	    tun->numqueues + tun->numdisabled == max_tap_queues)
 		goto out;
 
 	err = 0;
@@ -1609,7 +1612,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		char *name;
 		unsigned long flags = 0;
 		int queues = ifr->ifr_flags & IFF_MULTI_QUEUE ?
-			     MAX_TAP_QUEUES : 1;
+			     max_tap_queues : 1;
 
 		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
@@ -2327,6 +2330,12 @@ static int __init tun_init(void)
 	pr_info("%s, %s\n", DRV_DESCRIPTION, DRV_VERSION);
 	pr_info("%s\n", DRV_COPYRIGHT);
 
+	if (max_tap_queues > MAX_TAP_QUEUES || max_tap_queues <= 0) {
+		printk(KERN_WARNING "max_tap_queues parameter value either too large"
+		 " or too small forcing default value: %d\n", MAX_TAP_QUEUES);
+		max_tap_queues = MAX_TAP_QUEUES;
+	}
+
 	ret = rtnl_link_register(&tun_link_ops);
 	if (ret) {
 		pr_err("Can't register link_ops\n");
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 3/4] tuntap: reduce the size of tun_struct by using flex array
  2014-08-18 13:37 [RFC 0/4] Increase the limit of tuntap queues Pankaj Gupta
  2014-08-18 13:37 ` [RFC 1/4] net: allow large number of rx queues Pankaj Gupta
  2014-08-18 13:37 ` [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param Pankaj Gupta
@ 2014-08-18 13:37 ` Pankaj Gupta
  2014-08-25  1:35   ` David Gibson
  2014-08-18 13:37 ` [RFC 4/4] tuntap: Increase the number of queues in tun Pankaj Gupta
  3 siblings, 1 reply; 19+ messages in thread
From: Pankaj Gupta @ 2014-08-18 13:37 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen,
	Pankaj Gupta

This patch switches to flex array to implement the flow caches, it brings
several advantages:

- Reduce the size of the tun_struct structure, which allows us to increase the
  upper limit of queues in future.
- Avoid higher order memory allocation. It will be useful when switching to
  pure hashing in flow cache which may demand a larger size array in future.

After this patch, the size of tun_struct on x86_64 reduced from 8512 to
328

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/net/tun.c |   49 +++++++++++++++++++++++++++++++++++++------------
 1 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index acaaf67..7dd495f 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -65,6 +65,7 @@
 #include <linux/nsproxy.h>
 #include <linux/virtio_net.h>
 #include <linux/rcupdate.h>
+#include <linux/flex_array.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 #include <net/rtnetlink.h>
@@ -186,7 +187,7 @@ struct tun_struct {
 	int debug;
 #endif
 	spinlock_t lock;
-	struct hlist_head flows[TUN_NUM_FLOW_ENTRIES];
+	struct flex_array *flows;
 	struct timer_list flow_gc_timer;
 	unsigned long ageing_time;
 	unsigned int numdisabled;
@@ -247,10 +248,11 @@ static void tun_flow_flush(struct tun_struct *tun)
 
 	spin_lock_bh(&tun->lock);
 	for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
+		struct hlist_head *h = flex_array_get(tun->flows, i);
 		struct tun_flow_entry *e;
 		struct hlist_node *n;
 
-		hlist_for_each_entry_safe(e, n, &tun->flows[i], hash_link)
+		hlist_for_each_entry_safe(e, n, h, hash_link)
 			tun_flow_delete(tun, e);
 	}
 	spin_unlock_bh(&tun->lock);
@@ -262,10 +264,11 @@ static void tun_flow_delete_by_queue(struct tun_struct *tun, u16 queue_index)
 
 	spin_lock_bh(&tun->lock);
 	for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
+		struct hlist_head *h = flex_array_get(tun->flows, i);
 		struct tun_flow_entry *e;
 		struct hlist_node *n;
 
-		hlist_for_each_entry_safe(e, n, &tun->flows[i], hash_link) {
+		hlist_for_each_entry_safe(e, n, h, hash_link) {
 			if (e->queue_index == queue_index)
 				tun_flow_delete(tun, e);
 		}
@@ -285,10 +288,11 @@ static void tun_flow_cleanup(unsigned long data)
 
 	spin_lock_bh(&tun->lock);
 	for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++) {
+		struct hlist_head *h = flex_array_get(tun->flows, i);
 		struct tun_flow_entry *e;
 		struct hlist_node *n;
 
-		hlist_for_each_entry_safe(e, n, &tun->flows[i], hash_link) {
+		hlist_for_each_entry_safe(e, n, h, hash_link) {
 			unsigned long this_timer;
 			count++;
 			this_timer = e->updated + delay;
@@ -315,7 +319,7 @@ static void tun_flow_update(struct tun_struct *tun, u32 rxhash,
 	if (!rxhash)
 		return;
 	else
-		head = &tun->flows[tun_hashfn(rxhash)];
+		head = flex_array_get(tun->flows, tun_hashfn(rxhash));
 
 	rcu_read_lock();
 
@@ -378,7 +382,8 @@ static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb,
 
 	txq = skb_get_hash(skb);
 	if (txq) {
-		e = tun_flow_find(&tun->flows[tun_hashfn(txq)], txq);
+		e = tun_flow_find(flex_array_get(tun->flows,
+						 tun_hashfn(txq)), txq);
 		if (e) {
 			tun_flow_save_rps_rxhash(e, txq);
 			txq = e->queue_index;
@@ -758,8 +763,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 		rxhash = skb_get_hash(skb);
 		if (rxhash) {
 			struct tun_flow_entry *e;
-			e = tun_flow_find(&tun->flows[tun_hashfn(rxhash)],
-					rxhash);
+			e = tun_flow_find(flex_array_get(tun->flows,
+							 tun_hashfn(rxhash)), rxhash);
 			if (e)
 				tun_flow_save_rps_rxhash(e, rxhash);
 		}
@@ -894,23 +899,40 @@ static const struct net_device_ops tap_netdev_ops = {
 #endif
 };
 
-static void tun_flow_init(struct tun_struct *tun)
+static int tun_flow_init(struct tun_struct *tun)
 {
-	int i;
+	struct flex_array *buckets;
+	int i, err;
+
+	buckets = flex_array_alloc(sizeof(struct hlist_head),
+				   TUN_NUM_FLOW_ENTRIES, GFP_KERNEL);
+	if (!buckets)
+		return -ENOMEM;
+
+	err = flex_array_prealloc(buckets, 0, TUN_NUM_FLOW_ENTRIES, GFP_KERNEL);
+	if (err) {
+		flex_array_free(buckets);
+		return -ENOMEM;
+	}
 
+	tun->flows = buckets;
 	for (i = 0; i < TUN_NUM_FLOW_ENTRIES; i++)
-		INIT_HLIST_HEAD(&tun->flows[i]);
+		INIT_HLIST_HEAD((struct hlist_head *)
+				flex_array_get(buckets, i));
 
 	tun->ageing_time = TUN_FLOW_EXPIRE;
 	setup_timer(&tun->flow_gc_timer, tun_flow_cleanup, (unsigned long)tun);
 	mod_timer(&tun->flow_gc_timer,
 		  round_jiffies_up(jiffies + tun->ageing_time));
+
+	return 0;
 }
 
 static void tun_flow_uninit(struct tun_struct *tun)
 {
 	del_timer_sync(&tun->flow_gc_timer);
 	tun_flow_flush(tun);
+	flex_array_free(tun->flows);
 }
 
 /* Initialize net device. */
@@ -1659,7 +1681,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 			goto err_free_dev;
 
 		tun_net_init(dev);
-		tun_flow_init(tun);
+
+		err = tun_flow_init(tun);
+		if (err < 0)
+			goto err_free_dev;
 
 		dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
 				   TUN_USER_FEATURES | NETIF_F_HW_VLAN_CTAG_TX |
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 4/4] tuntap: Increase the number of queues in tun
  2014-08-18 13:37 [RFC 0/4] Increase the limit of tuntap queues Pankaj Gupta
                   ` (2 preceding siblings ...)
  2014-08-18 13:37 ` [RFC 3/4] tuntap: reduce the size of tun_struct by using flex array Pankaj Gupta
@ 2014-08-18 13:37 ` Pankaj Gupta
  2014-08-25  1:37   ` David Gibson
  3 siblings, 1 reply; 19+ messages in thread
From: Pankaj Gupta @ 2014-08-18 13:37 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen,
	Pankaj Gupta

Networking under kvm works best if we allocate a per-vCPU RX and TX
queue in a virtual NIC. This requires a per-vCPU queue on the host side.

It is now safe to increase the maximum number of queues.
Preceding patches:
	net: allow large number of rx queues
  	tuntap: Reduce the size of tun_struct by using flex array
	tuntap: Publish tuntap max queue length as module_param

	made sure this won't cause failures due to high order memory
allocations. Increase it to 256: this is the max number of vCPUs
KVM supports.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/net/tun.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 98bad1f..893eba8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -111,10 +111,11 @@ struct tap_filter {
 	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
 };
 
-/* DEFAULT_MAX_NUM_RSS_QUEUES were chosen to let the rx/tx queues allocated for
- * the netdevice to be fit in one page. So we can make sure the success of
- * memory allocation. TODO: increase the limit. */
-#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
+/* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
+ * to max number of vCPUS in guest. Also, we are making sure here
+ * queue memory allocation do not fail.
+ */
+#define MAX_TAP_QUEUES 256
 #define MAX_TAP_FLOWS  4096
 
 #define TUN_FLOW_EXPIRE (3 * HZ)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC 1/4] net: allow large number of rx queues
  2014-08-18 13:37 ` [RFC 1/4] net: allow large number of rx queues Pankaj Gupta
@ 2014-08-18 17:43   ` Sergei Shtylyov
  2014-08-19  5:15     ` Pankaj Gupta
  0 siblings, 1 reply; 19+ messages in thread
From: Sergei Shtylyov @ 2014-08-18 17:43 UTC (permalink / raw)
  To: Pankaj Gupta, linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen

Hello.

On 08/18/2014 05:37 PM, Pankaj Gupta wrote:

> netif_alloc_rx_queues() uses kcalloc() to allocate memory
> for "struct netdev_queue *_rx" array.
> If we are doing large rx queue allocation kcalloc() might
> fail, so this patch does a fallback to vzalloc().
> Similar implementation is done for tx queue allocation in
> netif_alloc_netdev_queues().

> We avoid failure of high order memory allocation
> with the help of vzalloc(), this allows us to do large
> rx and tx queue allocation which in turn helps us to
> increase the number of queues in tun.

> As vmalloc() adds overhead on a critical network path,
> __GFP_REPEAT flag is used with kzalloc() to do this fallback
> only when really needed.

> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: David Gibson <dgibson@redhat.com>
> ---
>   net/core/dev.c | 20 +++++++++++++-------
>   1 file changed, 13 insertions(+), 7 deletions(-)

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1c15b18..a455a02 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5942,17 +5942,24 @@ void netif_stacked_transfer_operstate(const struct net_device *rootdev,
>   EXPORT_SYMBOL(netif_stacked_transfer_operstate);
>
>   #ifdef CONFIG_SYSFS
> +static void netif_free_rx_queues(struct net_device *dev)
> +{
> +	kvfree(dev->_rx);
> +}
> +
>   static int netif_alloc_rx_queues(struct net_device *dev)
>   {
>   	unsigned int i, count = dev->num_rx_queues;
>   	struct netdev_rx_queue *rx;
> -
> +	size_t sz = count * sizeof(*rx);

   Please keep an empty line after declarations.

>   	BUG_ON(count < 1);
>

WBR, Sergei


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 1/4] net: allow large number of rx queues
  2014-08-18 17:43   ` Sergei Shtylyov
@ 2014-08-19  5:15     ` Pankaj Gupta
  0 siblings, 0 replies; 19+ messages in thread
From: Pankaj Gupta @ 2014-08-19  5:15 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: linux-kernel, netdev, davem, jasowang, mst, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen


> Hello.
> 
> On 08/18/2014 05:37 PM, Pankaj Gupta wrote:
> 
> > netif_alloc_rx_queues() uses kcalloc() to allocate memory
> > for "struct netdev_queue *_rx" array.
> > If we are doing large rx queue allocation kcalloc() might
> > fail, so this patch does a fallback to vzalloc().
> > Similar implementation is done for tx queue allocation in
> > netif_alloc_netdev_queues().
> 
> > We avoid failure of high order memory allocation
> > with the help of vzalloc(), this allows us to do large
> > rx and tx queue allocation which in turn helps us to
> > increase the number of queues in tun.
> 
> > As vmalloc() adds overhead on a critical network path,
> > __GFP_REPEAT flag is used with kzalloc() to do this fallback
> > only when really needed.
> 
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > Reviewed-by: David Gibson <dgibson@redhat.com>
> > ---
> >   net/core/dev.c | 20 +++++++++++++-------
> >   1 file changed, 13 insertions(+), 7 deletions(-)
> 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 1c15b18..a455a02 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -5942,17 +5942,24 @@ void netif_stacked_transfer_operstate(const struct
> > net_device *rootdev,
> >   EXPORT_SYMBOL(netif_stacked_transfer_operstate);
> >
> >   #ifdef CONFIG_SYSFS
> > +static void netif_free_rx_queues(struct net_device *dev)
> > +{
> > +	kvfree(dev->_rx);
> > +}
> > +
> >   static int netif_alloc_rx_queues(struct net_device *dev)
> >   {
> >   	unsigned int i, count = dev->num_rx_queues;
> >   	struct netdev_rx_queue *rx;
> > -
> > +	size_t sz = count * sizeof(*rx);
> 
>    Please keep an empty line after declarations.
     Yes, I will add it. Thanks
> 
> >   	BUG_ON(count < 1);
> >
> 
> WBR, Sergei
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param
  2014-08-18 13:37 ` [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param Pankaj Gupta
@ 2014-08-20 10:58   ` Jiri Pirko
  2014-08-20 11:17     ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Jiri Pirko @ 2014-08-20 10:58 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, netdev, davem, jasowang, mst, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen

Mon, Aug 18, 2014 at 03:37:18PM CEST, pagupta@redhat.com wrote:
> This patch publishes maximum number of tun/tap queues allocated as a
> read_only module parameter which a user space application like libvirt
> can make use of to limit maximum number of queues. Value of read_only
> module parameter can be writable only at module load time. If no value is set
> at module load time a default value 256 is used which is equal to maximum number
> of vCPUS allowed by KVM.
>
> Administrator can specify maximum number of queues only at the driver
> module load time.
>
>Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
>---
> drivers/net/tun.c |   13 +++++++++++--
> 1 files changed, 11 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>index acaaf67..1f518e2 100644
>--- a/drivers/net/tun.c
>+++ b/drivers/net/tun.c
>@@ -119,6 +119,9 @@ struct tap_filter {
> 
> #define TUN_FLOW_EXPIRE (3 * HZ)
> 
>+static int max_tap_queues = MAX_TAP_QUEUES;
>+module_param(max_tap_queues, int, S_IRUGO);

Please do not introduce new module paramaters. Please other ways to
interchange values with userspace.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param
  2014-08-20 10:58   ` Jiri Pirko
@ 2014-08-20 11:17     ` Michael S. Tsirkin
  2014-08-20 11:46       ` Jiri Pirko
  2014-08-21  4:30       ` Jason Wang
  0 siblings, 2 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2014-08-20 11:17 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Pankaj Gupta, linux-kernel, netdev, davem, jasowang, dgibson,
	vfalico, edumazet, vyasevic, hkchu, wuzhy, xemul, therbert,
	bhutchings, xii, stephen

On Wed, Aug 20, 2014 at 12:58:17PM +0200, Jiri Pirko wrote:
> Mon, Aug 18, 2014 at 03:37:18PM CEST, pagupta@redhat.com wrote:
> > This patch publishes maximum number of tun/tap queues allocated as a
> > read_only module parameter which a user space application like libvirt
> > can make use of to limit maximum number of queues. Value of read_only
> > module parameter can be writable only at module load time. If no value is set
> > at module load time a default value 256 is used which is equal to maximum number
> > of vCPUS allowed by KVM.
> >
> > Administrator can specify maximum number of queues only at the driver
> > module load time.
> >
> >Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> >---
> > drivers/net/tun.c |   13 +++++++++++--
> > 1 files changed, 11 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >index acaaf67..1f518e2 100644
> >--- a/drivers/net/tun.c
> >+++ b/drivers/net/tun.c
> >@@ -119,6 +119,9 @@ struct tap_filter {
> > 
> > #define TUN_FLOW_EXPIRE (3 * HZ)
> > 
> >+static int max_tap_queues = MAX_TAP_QUEUES;
> >+module_param(max_tap_queues, int, S_IRUGO);
> 
> Please do not introduce new module paramaters. Please other ways to
> interchange values with userspace.

I suggested this initially, but thinking more about it, I agree.

It's a global limit (necessary to limit memory utilization by
userspace), but it should be possible to change it
after module load.

Additionally, userspace that has the FD should be able to
retrieve the value without guessing that the FD is
for the tun device (and not e.g. macvtap).
To retrieve the value, an ioctl is probably the
cleanest approach.

To set it, how about a sysctl? I think the limit can also apply to
all devices, not just tun.

-- 
MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param
  2014-08-20 11:17     ` Michael S. Tsirkin
@ 2014-08-20 11:46       ` Jiri Pirko
  2014-08-20 11:49         ` Michael S. Tsirkin
  2014-08-21  4:30       ` Jason Wang
  1 sibling, 1 reply; 19+ messages in thread
From: Jiri Pirko @ 2014-08-20 11:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Pankaj Gupta, linux-kernel, netdev, davem, jasowang, dgibson,
	vfalico, edumazet, vyasevic, hkchu, wuzhy, xemul, therbert,
	bhutchings, xii, stephen

Wed, Aug 20, 2014 at 01:17:24PM CEST, mst@redhat.com wrote:
>On Wed, Aug 20, 2014 at 12:58:17PM +0200, Jiri Pirko wrote:
>> Mon, Aug 18, 2014 at 03:37:18PM CEST, pagupta@redhat.com wrote:
>> > This patch publishes maximum number of tun/tap queues allocated as a
>> > read_only module parameter which a user space application like libvirt
>> > can make use of to limit maximum number of queues. Value of read_only
>> > module parameter can be writable only at module load time. If no value is set
>> > at module load time a default value 256 is used which is equal to maximum number
>> > of vCPUS allowed by KVM.
>> >
>> > Administrator can specify maximum number of queues only at the driver
>> > module load time.
>> >
>> >Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
>> >---
>> > drivers/net/tun.c |   13 +++++++++++--
>> > 1 files changed, 11 insertions(+), 2 deletions(-)
>> >
>> >diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> >index acaaf67..1f518e2 100644
>> >--- a/drivers/net/tun.c
>> >+++ b/drivers/net/tun.c
>> >@@ -119,6 +119,9 @@ struct tap_filter {
>> > 
>> > #define TUN_FLOW_EXPIRE (3 * HZ)
>> > 
>> >+static int max_tap_queues = MAX_TAP_QUEUES;
>> >+module_param(max_tap_queues, int, S_IRUGO);
>> 
>> Please do not introduce new module paramaters. Please other ways to
>> interchange values with userspace.
>
>I suggested this initially, but thinking more about it, I agree.
>
>It's a global limit (necessary to limit memory utilization by
>userspace), but it should be possible to change it
>after module load.
>
>Additionally, userspace that has the FD should be able to
>retrieve the value without guessing that the FD is
>for the tun device (and not e.g. macvtap).
>To retrieve the value, an ioctl is probably the
>cleanest approach.
>
>To set it, how about a sysctl? I think the limit can also apply to
>all devices, not just tun.

Or netlink?

>
>-- 
>MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param
  2014-08-20 11:46       ` Jiri Pirko
@ 2014-08-20 11:49         ` Michael S. Tsirkin
  2014-08-20 11:53           ` Jiri Pirko
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2014-08-20 11:49 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Pankaj Gupta, linux-kernel, netdev, davem, jasowang, dgibson,
	vfalico, edumazet, vyasevic, hkchu, wuzhy, xemul, therbert,
	bhutchings, xii, stephen

On Wed, Aug 20, 2014 at 01:46:20PM +0200, Jiri Pirko wrote:
> Wed, Aug 20, 2014 at 01:17:24PM CEST, mst@redhat.com wrote:
> >On Wed, Aug 20, 2014 at 12:58:17PM +0200, Jiri Pirko wrote:
> >> Mon, Aug 18, 2014 at 03:37:18PM CEST, pagupta@redhat.com wrote:
> >> > This patch publishes maximum number of tun/tap queues allocated as a
> >> > read_only module parameter which a user space application like libvirt
> >> > can make use of to limit maximum number of queues. Value of read_only
> >> > module parameter can be writable only at module load time. If no value is set
> >> > at module load time a default value 256 is used which is equal to maximum number
> >> > of vCPUS allowed by KVM.
> >> >
> >> > Administrator can specify maximum number of queues only at the driver
> >> > module load time.
> >> >
> >> >Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> >> >---
> >> > drivers/net/tun.c |   13 +++++++++++--
> >> > 1 files changed, 11 insertions(+), 2 deletions(-)
> >> >
> >> >diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >> >index acaaf67..1f518e2 100644
> >> >--- a/drivers/net/tun.c
> >> >+++ b/drivers/net/tun.c
> >> >@@ -119,6 +119,9 @@ struct tap_filter {
> >> > 
> >> > #define TUN_FLOW_EXPIRE (3 * HZ)
> >> > 
> >> >+static int max_tap_queues = MAX_TAP_QUEUES;
> >> >+module_param(max_tap_queues, int, S_IRUGO);
> >> 
> >> Please do not introduce new module paramaters. Please other ways to
> >> interchange values with userspace.
> >
> >I suggested this initially, but thinking more about it, I agree.
> >
> >It's a global limit (necessary to limit memory utilization by
> >userspace), but it should be possible to change it
> >after module load.
> >
> >Additionally, userspace that has the FD should be able to
> >retrieve the value without guessing that the FD is
> >for the tun device (and not e.g. macvtap).
> >To retrieve the value, an ioctl is probably the
> >cleanest approach.
> >
> >To set it, how about a sysctl? I think the limit can also apply to
> >all devices, not just tun.
> 
> Or netlink?

Are there examples of netlink being used to set global defaults
as opposed to per-device parameters?

> >
> >-- 
> >MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param
  2014-08-20 11:49         ` Michael S. Tsirkin
@ 2014-08-20 11:53           ` Jiri Pirko
  0 siblings, 0 replies; 19+ messages in thread
From: Jiri Pirko @ 2014-08-20 11:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Pankaj Gupta, linux-kernel, netdev, davem, jasowang, dgibson,
	vfalico, edumazet, vyasevic, hkchu, wuzhy, xemul, therbert,
	bhutchings, xii, stephen

Wed, Aug 20, 2014 at 01:49:07PM CEST, mst@redhat.com wrote:
>On Wed, Aug 20, 2014 at 01:46:20PM +0200, Jiri Pirko wrote:
>> Wed, Aug 20, 2014 at 01:17:24PM CEST, mst@redhat.com wrote:
>> >On Wed, Aug 20, 2014 at 12:58:17PM +0200, Jiri Pirko wrote:
>> >> Mon, Aug 18, 2014 at 03:37:18PM CEST, pagupta@redhat.com wrote:
>> >> > This patch publishes maximum number of tun/tap queues allocated as a
>> >> > read_only module parameter which a user space application like libvirt
>> >> > can make use of to limit maximum number of queues. Value of read_only
>> >> > module parameter can be writable only at module load time. If no value is set
>> >> > at module load time a default value 256 is used which is equal to maximum number
>> >> > of vCPUS allowed by KVM.
>> >> >
>> >> > Administrator can specify maximum number of queues only at the driver
>> >> > module load time.
>> >> >
>> >> >Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
>> >> >---
>> >> > drivers/net/tun.c |   13 +++++++++++--
>> >> > 1 files changed, 11 insertions(+), 2 deletions(-)
>> >> >
>> >> >diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> >> >index acaaf67..1f518e2 100644
>> >> >--- a/drivers/net/tun.c
>> >> >+++ b/drivers/net/tun.c
>> >> >@@ -119,6 +119,9 @@ struct tap_filter {
>> >> > 
>> >> > #define TUN_FLOW_EXPIRE (3 * HZ)
>> >> > 
>> >> >+static int max_tap_queues = MAX_TAP_QUEUES;
>> >> >+module_param(max_tap_queues, int, S_IRUGO);
>> >> 
>> >> Please do not introduce new module paramaters. Please other ways to
>> >> interchange values with userspace.
>> >
>> >I suggested this initially, but thinking more about it, I agree.
>> >
>> >It's a global limit (necessary to limit memory utilization by
>> >userspace), but it should be possible to change it
>> >after module load.
>> >
>> >Additionally, userspace that has the FD should be able to
>> >retrieve the value without guessing that the FD is
>> >for the tun device (and not e.g. macvtap).
>> >To retrieve the value, an ioctl is probably the
>> >cleanest approach.
>> >
>> >To set it, how about a sysctl? I think the limit can also apply to
>> >all devices, not just tun.
>> 
>> Or netlink?
>
>Are there examples of netlink being used to set global defaults
>as opposed to per-device parameters?

That is so far not possible. But I believe that it can be implemented.
I'm just thinking out loud.

>
>> >
>> >-- 
>> >MST

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param
  2014-08-20 11:17     ` Michael S. Tsirkin
  2014-08-20 11:46       ` Jiri Pirko
@ 2014-08-21  4:30       ` Jason Wang
  2014-08-22 11:52         ` Pankaj Gupta
  1 sibling, 1 reply; 19+ messages in thread
From: Jason Wang @ 2014-08-21  4:30 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jiri Pirko
  Cc: Pankaj Gupta, linux-kernel, netdev, davem, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen

On 08/20/2014 07:17 PM, Michael S. Tsirkin wrote:
> On Wed, Aug 20, 2014 at 12:58:17PM +0200, Jiri Pirko wrote:
>> > Mon, Aug 18, 2014 at 03:37:18PM CEST, pagupta@redhat.com wrote:
>>> > > This patch publishes maximum number of tun/tap queues allocated as a
>>> > > read_only module parameter which a user space application like libvirt
>>> > > can make use of to limit maximum number of queues. Value of read_only
>>> > > module parameter can be writable only at module load time. If no value is set
>>> > > at module load time a default value 256 is used which is equal to maximum number
>>> > > of vCPUS allowed by KVM.
>>> > >
>>> > > Administrator can specify maximum number of queues only at the driver
>>> > > module load time.
>>> > >
>>> > >Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
>>> > >---
>>> > > drivers/net/tun.c |   13 +++++++++++--
>>> > > 1 files changed, 11 insertions(+), 2 deletions(-)
>>> > >
>>> > >diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>> > >index acaaf67..1f518e2 100644
>>> > >--- a/drivers/net/tun.c
>>> > >+++ b/drivers/net/tun.c
>>> > >@@ -119,6 +119,9 @@ struct tap_filter {
>>> > > 
>>> > > #define TUN_FLOW_EXPIRE (3 * HZ)
>>> > > 
>>> > >+static int max_tap_queues = MAX_TAP_QUEUES;
>>> > >+module_param(max_tap_queues, int, S_IRUGO);
>> > 
>> > Please do not introduce new module paramaters. Please other ways to
>> > interchange values with userspace.
> I suggested this initially, but thinking more about it, I agree.
>
> It's a global limit (necessary to limit memory utilization by
> userspace), but it should be possible to change it
> after module load.

How about pass this limit through ifr during TUNSETIFF, then
alloc_netdev_mq() can use this limit.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param
  2014-08-21  4:30       ` Jason Wang
@ 2014-08-22 11:52         ` Pankaj Gupta
  2014-08-24 11:14           ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Pankaj Gupta @ 2014-08-22 11:52 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Jiri Pirko, linux-kernel, netdev, davem,
	dgibson, vfalico, edumazet, vyasevic, hkchu, wuzhy, xemul,
	therbert, bhutchings, xii, stephen


> 
> On 08/20/2014 07:17 PM, Michael S. Tsirkin wrote:
> > On Wed, Aug 20, 2014 at 12:58:17PM +0200, Jiri Pirko wrote:
> >> > Mon, Aug 18, 2014 at 03:37:18PM CEST, pagupta@redhat.com wrote:
> >>> > > This patch publishes maximum number of tun/tap queues allocated as a
> >>> > > read_only module parameter which a user space application like
> >>> > > libvirt
> >>> > > can make use of to limit maximum number of queues. Value of read_only
> >>> > > module parameter can be writable only at module load time. If no
> >>> > > value is set
> >>> > > at module load time a default value 256 is used which is equal to
> >>> > > maximum number
> >>> > > of vCPUS allowed by KVM.
> >>> > >
> >>> > > Administrator can specify maximum number of queues only at the driver
> >>> > > module load time.
> >>> > >
> >>> > >Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> >>> > >---
> >>> > > drivers/net/tun.c |   13 +++++++++++--
> >>> > > 1 files changed, 11 insertions(+), 2 deletions(-)
> >>> > >
> >>> > >diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >>> > >index acaaf67..1f518e2 100644
> >>> > >--- a/drivers/net/tun.c
> >>> > >+++ b/drivers/net/tun.c
> >>> > >@@ -119,6 +119,9 @@ struct tap_filter {
> >>> > > 
> >>> > > #define TUN_FLOW_EXPIRE (3 * HZ)
> >>> > > 
> >>> > >+static int max_tap_queues = MAX_TAP_QUEUES;
> >>> > >+module_param(max_tap_queues, int, S_IRUGO);
> >> > 
> >> > Please do not introduce new module paramaters. Please other ways to
> >> > interchange values with userspace.
> > I suggested this initially, but thinking more about it, I agree.
> >
> > It's a global limit (necessary to limit memory utilization by
> > userspace), but it should be possible to change it
> > after module load.
> 
> How about pass this limit through ifr during TUNSETIFF, then
> alloc_netdev_mq() can use this limit.

Any other ideas/comments from the experts. Or shall I re-repost other patches 
in the series except this patch until we agree on one.

> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param
  2014-08-22 11:52         ` Pankaj Gupta
@ 2014-08-24 11:14           ` Michael S. Tsirkin
  2014-08-25  2:57             ` Jason Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2014-08-24 11:14 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Jason Wang, Jiri Pirko, linux-kernel, netdev, davem, dgibson,
	vfalico, edumazet, vyasevic, hkchu, wuzhy, xemul, therbert,
	bhutchings, xii, stephen

On Fri, Aug 22, 2014 at 07:52:22AM -0400, Pankaj Gupta wrote:
> 
> > 
> > On 08/20/2014 07:17 PM, Michael S. Tsirkin wrote:
> > > On Wed, Aug 20, 2014 at 12:58:17PM +0200, Jiri Pirko wrote:
> > >> > Mon, Aug 18, 2014 at 03:37:18PM CEST, pagupta@redhat.com wrote:
> > >>> > > This patch publishes maximum number of tun/tap queues allocated as a
> > >>> > > read_only module parameter which a user space application like
> > >>> > > libvirt
> > >>> > > can make use of to limit maximum number of queues. Value of read_only
> > >>> > > module parameter can be writable only at module load time. If no
> > >>> > > value is set
> > >>> > > at module load time a default value 256 is used which is equal to
> > >>> > > maximum number
> > >>> > > of vCPUS allowed by KVM.
> > >>> > >
> > >>> > > Administrator can specify maximum number of queues only at the driver
> > >>> > > module load time.
> > >>> > >
> > >>> > >Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > >>> > >---
> > >>> > > drivers/net/tun.c |   13 +++++++++++--
> > >>> > > 1 files changed, 11 insertions(+), 2 deletions(-)
> > >>> > >
> > >>> > >diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > >>> > >index acaaf67..1f518e2 100644
> > >>> > >--- a/drivers/net/tun.c
> > >>> > >+++ b/drivers/net/tun.c
> > >>> > >@@ -119,6 +119,9 @@ struct tap_filter {
> > >>> > > 
> > >>> > > #define TUN_FLOW_EXPIRE (3 * HZ)
> > >>> > > 
> > >>> > >+static int max_tap_queues = MAX_TAP_QUEUES;
> > >>> > >+module_param(max_tap_queues, int, S_IRUGO);
> > >> > 
> > >> > Please do not introduce new module paramaters. Please other ways to
> > >> > interchange values with userspace.
> > > I suggested this initially, but thinking more about it, I agree.
> > >
> > > It's a global limit (necessary to limit memory utilization by
> > > userspace), but it should be possible to change it
> > > after module load.
> > 
> > How about pass this limit through ifr during TUNSETIFF, then
> > alloc_netdev_mq() can use this limit.
> 
> Any other ideas/comments from the experts. Or shall I re-repost other patches 
> in the series except this patch until we agree on one.
> 
> > 

It's kind of useless without a way for userspace to discover
how many queues it can create, no?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/4] tuntap: reduce the size of tun_struct by using flex array
  2014-08-18 13:37 ` [RFC 3/4] tuntap: reduce the size of tun_struct by using flex array Pankaj Gupta
@ 2014-08-25  1:35   ` David Gibson
  0 siblings, 0 replies; 19+ messages in thread
From: David Gibson @ 2014-08-25  1:35 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, netdev, davem, jasowang, mst, vfalico, edumazet,
	vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings, xii,
	stephen

[-- Attachment #1: Type: text/plain, Size: 748 bytes --]

On Mon, 18 Aug 2014 19:07:19 +0530
Pankaj Gupta <pagupta@redhat.com> wrote:

> This patch switches to flex array to implement the flow caches, it brings
> several advantages:
> 
> - Reduce the size of the tun_struct structure, which allows us to increase the
>   upper limit of queues in future.
> - Avoid higher order memory allocation. It will be useful when switching to
>   pure hashing in flow cache which may demand a larger size array in future.
> 
> After this patch, the size of tun_struct on x86_64 reduced from 8512 to
> 328
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>

Reviewed-by: David Gibson <dgibson@redhat.com>

-- 
David Gibson <dgibson@redhat.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 4/4] tuntap: Increase the number of queues in tun
  2014-08-18 13:37 ` [RFC 4/4] tuntap: Increase the number of queues in tun Pankaj Gupta
@ 2014-08-25  1:37   ` David Gibson
  0 siblings, 0 replies; 19+ messages in thread
From: David Gibson @ 2014-08-25  1:37 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, netdev, davem, jasowang, mst, vfalico, edumazet,
	vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings, xii,
	stephen

[-- Attachment #1: Type: text/plain, Size: 780 bytes --]

On Mon, 18 Aug 2014 19:07:20 +0530
Pankaj Gupta <pagupta@redhat.com> wrote:

> Networking under kvm works best if we allocate a per-vCPU RX and TX
> queue in a virtual NIC. This requires a per-vCPU queue on the host side.
> 
> It is now safe to increase the maximum number of queues.
> Preceding patches:
> 	net: allow large number of rx queues
>   	tuntap: Reduce the size of tun_struct by using flex array
> 	tuntap: Publish tuntap max queue length as module_param
> 
> 	made sure this won't cause failures due to high order memory
> allocations. Increase it to 256: this is the max number of vCPUs
> KVM supports.
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>

Reviewed-by: David Gibson <dgibson@redhat.com>

-- 
David Gibson <dgibson@redhat.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param
  2014-08-24 11:14           ` Michael S. Tsirkin
@ 2014-08-25  2:57             ` Jason Wang
  2014-08-26 15:30               ` Pankaj Gupta
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2014-08-25  2:57 UTC (permalink / raw)
  To: Michael S. Tsirkin, Pankaj Gupta
  Cc: Jiri Pirko, linux-kernel, netdev, davem, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen

On 08/24/2014 07:14 PM, Michael S. Tsirkin wrote:
> On Fri, Aug 22, 2014 at 07:52:22AM -0400, Pankaj Gupta wrote:
>>> On 08/20/2014 07:17 PM, Michael S. Tsirkin wrote:
>>>> On Wed, Aug 20, 2014 at 12:58:17PM +0200, Jiri Pirko wrote:
>>>>>> Mon, Aug 18, 2014 at 03:37:18PM CEST, pagupta@redhat.com wrote:
>>>>>>>> This patch publishes maximum number of tun/tap queues allocated as a
>>>>>>>> read_only module parameter which a user space application like
>>>>>>>> libvirt
>>>>>>>> can make use of to limit maximum number of queues. Value of read_only
>>>>>>>> module parameter can be writable only at module load time. If no
>>>>>>>> value is set
>>>>>>>> at module load time a default value 256 is used which is equal to
>>>>>>>> maximum number
>>>>>>>> of vCPUS allowed by KVM.
>>>>>>>>
>>>>>>>> Administrator can specify maximum number of queues only at the driver
>>>>>>>> module load time.
>>>>>>>>
>>>>>>>> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
>>>>>>>> ---
>>>>>>>> drivers/net/tun.c |   13 +++++++++++--
>>>>>>>> 1 files changed, 11 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>>>>> index acaaf67..1f518e2 100644
>>>>>>>> --- a/drivers/net/tun.c
>>>>>>>> +++ b/drivers/net/tun.c
>>>>>>>> @@ -119,6 +119,9 @@ struct tap_filter {
>>>>>>>>
>>>>>>>> #define TUN_FLOW_EXPIRE (3 * HZ)
>>>>>>>>
>>>>>>>> +static int max_tap_queues = MAX_TAP_QUEUES;
>>>>>>>> +module_param(max_tap_queues, int, S_IRUGO);
>>>>>> Please do not introduce new module paramaters. Please other ways to
>>>>>> interchange values with userspace.
>>>> I suggested this initially, but thinking more about it, I agree.
>>>>
>>>> It's a global limit (necessary to limit memory utilization by
>>>> userspace), but it should be possible to change it
>>>> after module load.
>>> How about pass this limit through ifr during TUNSETIFF, then
>>> alloc_netdev_mq() can use this limit.
>> Any other ideas/comments from the experts. Or shall I re-repost other patches 
>> in the series except this patch until we agree on one.
>>
> It's kind of useless without a way for userspace to discover
> how many queues it can create, no?
>

We can implement ethtool_get_channels for tuntap. But I'm still not
clear why this is necessary.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param
  2014-08-25  2:57             ` Jason Wang
@ 2014-08-26 15:30               ` Pankaj Gupta
  0 siblings, 0 replies; 19+ messages in thread
From: Pankaj Gupta @ 2014-08-26 15:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Jiri Pirko, linux-kernel, netdev, davem,
	dgibson, vfalico, edumazet, vyasevic, hkchu, wuzhy, xemul,
	therbert, bhutchings, xii, stephen


> On 08/24/2014 07:14 PM, Michael S. Tsirkin wrote:
> > On Fri, Aug 22, 2014 at 07:52:22AM -0400, Pankaj Gupta wrote:
> >>> On 08/20/2014 07:17 PM, Michael S. Tsirkin wrote:
> >>>> On Wed, Aug 20, 2014 at 12:58:17PM +0200, Jiri Pirko wrote:
> >>>>>> Mon, Aug 18, 2014 at 03:37:18PM CEST, pagupta@redhat.com wrote:
> >>>>>>>> This patch publishes maximum number of tun/tap queues allocated as a
> >>>>>>>> read_only module parameter which a user space application like
> >>>>>>>> libvirt
> >>>>>>>> can make use of to limit maximum number of queues. Value of
> >>>>>>>> read_only
> >>>>>>>> module parameter can be writable only at module load time. If no
> >>>>>>>> value is set
> >>>>>>>> at module load time a default value 256 is used which is equal to
> >>>>>>>> maximum number
> >>>>>>>> of vCPUS allowed by KVM.
> >>>>>>>>
> >>>>>>>> Administrator can specify maximum number of queues only at the
> >>>>>>>> driver
> >>>>>>>> module load time.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> >>>>>>>> ---
> >>>>>>>> drivers/net/tun.c |   13 +++++++++++--
> >>>>>>>> 1 files changed, 11 insertions(+), 2 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >>>>>>>> index acaaf67..1f518e2 100644
> >>>>>>>> --- a/drivers/net/tun.c
> >>>>>>>> +++ b/drivers/net/tun.c
> >>>>>>>> @@ -119,6 +119,9 @@ struct tap_filter {
> >>>>>>>>
> >>>>>>>> #define TUN_FLOW_EXPIRE (3 * HZ)
> >>>>>>>>
> >>>>>>>> +static int max_tap_queues = MAX_TAP_QUEUES;
> >>>>>>>> +module_param(max_tap_queues, int, S_IRUGO);
> >>>>>> Please do not introduce new module paramaters. Please other ways to
> >>>>>> interchange values with userspace.
> >>>> I suggested this initially, but thinking more about it, I agree.
> >>>>
> >>>> It's a global limit (necessary to limit memory utilization by
> >>>> userspace), but it should be possible to change it
> >>>> after module load.
> >>> How about pass this limit through ifr during TUNSETIFF, then
> >>> alloc_netdev_mq() can use this limit.
> >> Any other ideas/comments from the experts. Or shall I re-repost other
> >> patches
> >> in the series except this patch until we agree on one.
> >>
> > It's kind of useless without a way for userspace to discover
> > how many queues it can create, no?
> >
> 
> We can implement ethtool_get_channels for tuntap. But I'm still not
> clear why this is necessary.

ethtool_get_channels for tuntap sounds good idea to retrieve number of queues
configured.
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-08-26 15:31 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-18 13:37 [RFC 0/4] Increase the limit of tuntap queues Pankaj Gupta
2014-08-18 13:37 ` [RFC 1/4] net: allow large number of rx queues Pankaj Gupta
2014-08-18 17:43   ` Sergei Shtylyov
2014-08-19  5:15     ` Pankaj Gupta
2014-08-18 13:37 ` [RFC 2/4] tuntap: Publish tuntap maximum number of queues as module_param Pankaj Gupta
2014-08-20 10:58   ` Jiri Pirko
2014-08-20 11:17     ` Michael S. Tsirkin
2014-08-20 11:46       ` Jiri Pirko
2014-08-20 11:49         ` Michael S. Tsirkin
2014-08-20 11:53           ` Jiri Pirko
2014-08-21  4:30       ` Jason Wang
2014-08-22 11:52         ` Pankaj Gupta
2014-08-24 11:14           ` Michael S. Tsirkin
2014-08-25  2:57             ` Jason Wang
2014-08-26 15:30               ` Pankaj Gupta
2014-08-18 13:37 ` [RFC 3/4] tuntap: reduce the size of tun_struct by using flex array Pankaj Gupta
2014-08-25  1:35   ` David Gibson
2014-08-18 13:37 ` [RFC 4/4] tuntap: Increase the number of queues in tun Pankaj Gupta
2014-08-25  1:37   ` David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).