All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 net-net 0/2] Increase the limit of tuntap queues
@ 2014-12-03  7:19 Pankaj Gupta
  2014-12-03  7:19 ` [PATCH v3 net-next 1/2] net: allow large number of rx queues Pankaj Gupta
  2014-12-03  7:19 ` [PATCH v3 net-next 2/2 tuntap: Increase the number of queues in tun Pankaj Gupta
  0 siblings, 2 replies; 11+ messages in thread
From: Pankaj Gupta @ 2014-12-03  7:19 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen, jiri,
	sergei.shtylyov, Pankaj Gupta

Networking under KVM works best if we allocate a per-vCPU rx and tx
queue in a virtual NIC. This requires a per-vCPU queue on the host side.
Modern physical NICs have multiqueue support for large number of queues.
To scale vNIC to run multiple queues parallel to maximum number of vCPU's
we need to increase number of queues support in tuntap.   

Changes from v2:
PATCH 3: David Miller     - flex array adds extra level of indirection
                            for preallocated array.(dropped, as flow array
			    is allocated using kzalloc with failover to zalloc). 
Changes from v1:
PATCH 2: David Miller     - sysctl changes to limit number of queues 
                            not required for unprivileged users(dropped).

Changes from RFC
PATCH 1: Sergei Shtylyov  - Add an empty line after declarations.
PATCH 2: Jiri Pirko -       Do not introduce new module paramaters.
	 Michael.S.Tsirkin- We can use sysctl for limiting max number
                            of queues.

This series is to increase the number of tuntap queues. Original work is being 
done by 'jasowang@redhat.com'. I am taking this 'https://lkml.org/lkml/2013/6/19/29' 
patch series as a reference. As per discussion in the patch series:

There were two reasons which prevented us from increasing number of tun queues:

- The netdev_queue array in netdevice were allocated through kmalloc, which may 
  cause a high order memory allocation too when we have several queues. 
  E.g. sizeof(netdev_queue) is 320, which means a high order allocation would 
  happens when the device has more than 16 queues.

- We store the hash buckets in tun_struct which results a very large size of
  tun_struct, this high order memory allocation fail easily when the memory is
  fragmented.

The patch 60877a32bce00041528576e6b8df5abe9251fa73 increases the number of tx 
queues. Memory allocation fallback to vzalloc() when kmalloc() fails.

This series tries to address following issues:

- Increase the number of netdev_queue queues for rx similarly its done for tx 
  queues by falling back to vzalloc() when memory allocation with kmalloc() fails.

- Increase number of queues to 256, maximum number is equal to maximum number 
  of vCPUS allowed in a guest.

I have done some testing to test any regression with sample program which creates 
tun/tap for single queue / multiqueue device. I have also done testing with multiple 
parallel Netperf sessions from guest to host for different combination of queues 
and CPU's. It seems to be working fine without much increase in cpu load with the 
increase in number of queues. Though i had limitation of 4 physical CPU's. 


For this test vhost threads are pinned to separate CPU's. Below are the results:
Host kernel: 3.18.rc4, Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 4 CPUS
NIC : Ethernet controller: Intel Corporation 82579LM Gigabit Network


Patch Applied  %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle  throughput
Single Queue
-------------
Before :all    7.94    0.01    1.79    3.00    0.26    0.15    0.00    3.21    0.00   83.64  64924.94
After  :all    2.15    0.00    0.82    2.21    0.08    0.13    0.00    0.83    0.00   93.79  68799.88

2 Queues
Before :all    6.75    0.06    1.91    3.93    0.23    0.21    0.00    3.84    0.00   83.07  69569.30
After  :all    2.12    0.00    0.92    2.51    0.08    0.15    0.00    1.19    0.00   93.02  71386.79

4 Queues
Before :all    6.09    0.05    1.88    3.83    0.22    0.22    0.00    3.74    0.00   83.98  76170.60
After  :all    2.12    0.00    1.01    2.72    0.09    0.16    0.00    1.47    0.00   92.43  75492.34

8 Queues
Before :all    5.80    0.05    1.91    3.97    0.21    0.23    0.00    3.88    0.00   83.96  70843.88
After  :all    2.06    0.00    1.06    2.77    0.09    0.17    0.00    1.66    0.00   92.19  74486.31
16 Queues
--------------
After  :all    2.04    0.00    1.13    2.90    0.10    0.18    0.00    2.02    0.00   91.63  73227.45

Patches Summary:
  net: allow large number of rx queues
  tuntap: Increase the number of queues in tun

 drivers/net/tun.c |    9 +++++----
 net/core/dev.c    |   19 +++++++++++++------
 2 files changed, 18 insertions(+), 10 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 net-next 1/2] net: allow large number of rx queues
  2014-12-03  7:19 [PATCH v3 net-net 0/2] Increase the limit of tuntap queues Pankaj Gupta
@ 2014-12-03  7:19 ` Pankaj Gupta
  2014-12-03  9:42   ` Michael S. Tsirkin
  2014-12-03  7:19 ` [PATCH v3 net-next 2/2 tuntap: Increase the number of queues in tun Pankaj Gupta
  1 sibling, 1 reply; 11+ messages in thread
From: Pankaj Gupta @ 2014-12-03  7:19 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen, jiri,
	sergei.shtylyov, Pankaj Gupta

netif_alloc_rx_queues() uses kcalloc() to allocate memory
for "struct netdev_queue *_rx" array.
If we are doing large rx queue allocation kcalloc() might
fail, so this patch does a fallback to vzalloc().
Similar implementation is done for tx queue allocation in
netif_alloc_netdev_queues().

We avoid failure of high order memory allocation
with the help of vzalloc(), this allows us to do large
rx and tx queue allocation which in turn helps us to
increase the number of queues in tun.

As vmalloc() adds overhead on a critical network path,
__GFP_REPEAT flag is used with kzalloc() to do this fallback
only when really needed.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Gibson <dgibson@redhat.com>
---
 net/core/dev.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index e916ba8..abe9560 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6059,17 +6059,25 @@ void netif_stacked_transfer_operstate(const struct net_device *rootdev,
 EXPORT_SYMBOL(netif_stacked_transfer_operstate);
 
 #ifdef CONFIG_SYSFS
+static void netif_free_rx_queues(struct net_device *dev)
+{
+	kvfree(dev->_rx);
+}
+
 static int netif_alloc_rx_queues(struct net_device *dev)
 {
 	unsigned int i, count = dev->num_rx_queues;
 	struct netdev_rx_queue *rx;
+	size_t sz = count * sizeof(*rx);
 
 	BUG_ON(count < 1);
 
-	rx = kcalloc(count, sizeof(struct netdev_rx_queue), GFP_KERNEL);
-	if (!rx)
-		return -ENOMEM;
-
+	rx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
+	if (!rx) {
+		rx = vzalloc(sz);
+		if (!rx)
+			return -ENOMEM;
+	}
 	dev->_rx = rx;
 
 	for (i = 0; i < count; i++)
@@ -6698,9 +6706,8 @@ void free_netdev(struct net_device *dev)
 
 	netif_free_tx_queues(dev);
 #ifdef CONFIG_SYSFS
-	kfree(dev->_rx);
+	netif_free_rx_queues(dev);
 #endif
-
 	kfree(rcu_dereference_protected(dev->ingress_queue, 1));
 
 	/* Flush device addresses */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 net-next 2/2 tuntap: Increase the number of queues in tun.
  2014-12-03  7:19 [PATCH v3 net-net 0/2] Increase the limit of tuntap queues Pankaj Gupta
  2014-12-03  7:19 ` [PATCH v3 net-next 1/2] net: allow large number of rx queues Pankaj Gupta
@ 2014-12-03  7:19 ` Pankaj Gupta
  2014-12-03  9:52   ` Michael S. Tsirkin
  1 sibling, 1 reply; 11+ messages in thread
From: Pankaj Gupta @ 2014-12-03  7:19 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: davem, jasowang, mst, dgibson, vfalico, edumazet, vyasevic,
	hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen, jiri,
	sergei.shtylyov, Pankaj Gupta

Networking under kvm works best if we allocate a per-vCPU RX and TX
queue in a virtual NIC. This requires a per-vCPU queue on the host side.

It is now safe to increase the maximum number of queues.
Preceding patche: 'net: allow large number of rx queues'
made sure this won't cause failures due to high order memory
allocations. Increase it to 256: this is the max number of vCPUs
KVM supports.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: David Gibson <dgibson@redhat.com>
---
 drivers/net/tun.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e3fa65a..a19dc5f8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -113,10 +113,11 @@ struct tap_filter {
 	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
 };
 
-/* DEFAULT_MAX_NUM_RSS_QUEUES were chosen to let the rx/tx queues allocated for
- * the netdevice to be fit in one page. So we can make sure the success of
- * memory allocation. TODO: increase the limit. */
-#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
+/* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
+ * to max number of vCPUS in guest. Also, we are making sure here
+ * queue memory allocation do not fail.
+ */
+#define MAX_TAP_QUEUES 256
 #define MAX_TAP_FLOWS  4096
 
 #define TUN_FLOW_EXPIRE (3 * HZ)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 net-next 1/2] net: allow large number of rx queues
  2014-12-03  7:19 ` [PATCH v3 net-next 1/2] net: allow large number of rx queues Pankaj Gupta
@ 2014-12-03  9:42   ` Michael S. Tsirkin
  2014-12-04 10:45     ` Pankaj Gupta
  0 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2014-12-03  9:42 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, netdev, davem, jasowang, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei.shtylyov

On Wed, Dec 03, 2014 at 12:49:36PM +0530, Pankaj Gupta wrote:
> netif_alloc_rx_queues() uses kcalloc() to allocate memory
> for "struct netdev_queue *_rx" array.
> If we are doing large rx queue allocation kcalloc() might
> fail, so this patch does a fallback to vzalloc().
> Similar implementation is done for tx queue allocation in
> netif_alloc_netdev_queues().
> 
> We avoid failure of high order memory allocation
> with the help of vzalloc(), this allows us to do large
> rx and tx queue allocation which in turn helps us to
> increase the number of queues in tun.
> 
> As vmalloc() adds overhead on a critical network path,
> __GFP_REPEAT flag is used with kzalloc() to do this fallback
> only when really needed.
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Reviewed-by: David Gibson <dgibson@redhat.com>
> ---
>  net/core/dev.c | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index e916ba8..abe9560 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6059,17 +6059,25 @@ void netif_stacked_transfer_operstate(const struct net_device *rootdev,
>  EXPORT_SYMBOL(netif_stacked_transfer_operstate);
>  
>  #ifdef CONFIG_SYSFS
> +static void netif_free_rx_queues(struct net_device *dev)
> +{
> +	kvfree(dev->_rx);
> +}
> +

I would just open-code this.

>  static int netif_alloc_rx_queues(struct net_device *dev)
>  {
>  	unsigned int i, count = dev->num_rx_queues;
>  	struct netdev_rx_queue *rx;
> +	size_t sz = count * sizeof(*rx);
>  
>  	BUG_ON(count < 1);
>  
> -	rx = kcalloc(count, sizeof(struct netdev_rx_queue), GFP_KERNEL);
> -	if (!rx)
> -		return -ENOMEM;
> -
> +	rx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> +	if (!rx) {
> +		rx = vzalloc(sz);
> +		if (!rx)
> +			return -ENOMEM;
> +	}
>  	dev->_rx = rx;
>  
>  	for (i = 0; i < count; i++)
> @@ -6698,9 +6706,8 @@ void free_netdev(struct net_device *dev)
>  
>  	netif_free_tx_queues(dev);
>  #ifdef CONFIG_SYSFS
> -	kfree(dev->_rx);
> +	netif_free_rx_queues(dev);
>  #endif
> -

and I think it's nicer with the empty line.

>  	kfree(rcu_dereference_protected(dev->ingress_queue, 1));
>  
>  	/* Flush device addresses */
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 net-next 2/2 tuntap: Increase the number of queues in tun.
  2014-12-03  7:19 ` [PATCH v3 net-next 2/2 tuntap: Increase the number of queues in tun Pankaj Gupta
@ 2014-12-03  9:52   ` Michael S. Tsirkin
  2014-12-04  2:55     ` Jason Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Michael S. Tsirkin @ 2014-12-03  9:52 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, netdev, davem, jasowang, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei.shtylyov

On Wed, Dec 03, 2014 at 12:49:37PM +0530, Pankaj Gupta wrote:
> Networking under kvm works best if we allocate a per-vCPU RX and TX
> queue in a virtual NIC. This requires a per-vCPU queue on the host side.
> 
> It is now safe to increase the maximum number of queues.
> Preceding patche: 'net: allow large number of rx queues'

s/patche/patch/

> made sure this won't cause failures due to high order memory
> allocations. Increase it to 256: this is the max number of vCPUs
> KVM supports.
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> Reviewed-by: David Gibson <dgibson@redhat.com>

Hmm it's kind of nasty that each tun device is now using x16 memory.
Maybe we should look at using a flex array instead, and removing the
limitation altogether (e.g. make it INT_MAX)?


> ---
>  drivers/net/tun.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index e3fa65a..a19dc5f8 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -113,10 +113,11 @@ struct tap_filter {
>  	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
>  };
>  
> -/* DEFAULT_MAX_NUM_RSS_QUEUES were chosen to let the rx/tx queues allocated for
> - * the netdevice to be fit in one page. So we can make sure the success of
> - * memory allocation. TODO: increase the limit. */
> -#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
> +/* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
> + * to max number of vCPUS in guest. Also, we are making sure here
> + * queue memory allocation do not fail.

It's not queue memory allocation anymore, is it?
I would say "
This also helps the tfiles field fit in 4K, so the whole tun
device only needs an order-1 allocation.
"

> + */
> +#define MAX_TAP_QUEUES 256
>  #define MAX_TAP_FLOWS  4096
>  
>  #define TUN_FLOW_EXPIRE (3 * HZ)
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 net-next 2/2 tuntap: Increase the number of queues in tun.
  2014-12-03  9:52   ` Michael S. Tsirkin
@ 2014-12-04  2:55     ` Jason Wang
  2014-12-04 10:20       ` Michael S. Tsirkin
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Wang @ 2014-12-04  2:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Pankaj Gupta, linux-kernel, netdev, davem, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei.shtylyov



On Wed, Dec 3, 2014 at 5:52 PM, Michael S. Tsirkin <mst@redhat.com> 
wrote:
> On Wed, Dec 03, 2014 at 12:49:37PM +0530, Pankaj Gupta wrote:
>>  Networking under kvm works best if we allocate a per-vCPU RX and TX
>>  queue in a virtual NIC. This requires a per-vCPU queue on the host 
>> side.
>>  
>>  It is now safe to increase the maximum number of queues.
>>  Preceding patche: 'net: allow large number of rx queues'
> 
> s/patche/patch/
> 
>>  made sure this won't cause failures due to high order memory
>>  allocations. Increase it to 256: this is the max number of vCPUs
>>  KVM supports.
>>  
>>  Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
>>  Reviewed-by: David Gibson <dgibson@redhat.com>
> 
> Hmm it's kind of nasty that each tun device is now using x16 memory.
> Maybe we should look at using a flex array instead, and removing the
> limitation altogether (e.g. make it INT_MAX)?

But this only happens when IFF_MULTIQUEUE were used.
And core has vmalloc() fallback.
So probably not a big issue?

> 
> 
> 
>>  ---
>>   drivers/net/tun.c | 9 +++++----
>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>  
>>  diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>  index e3fa65a..a19dc5f8 100644
>>  --- a/drivers/net/tun.c
>>  +++ b/drivers/net/tun.c
>>  @@ -113,10 +113,11 @@ struct tap_filter {
>>   	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
>>   };
>>   
>>  -/* DEFAULT_MAX_NUM_RSS_QUEUES were chosen to let the rx/tx queues 
>> allocated for
>>  - * the netdevice to be fit in one page. So we can make sure the 
>> success of
>>  - * memory allocation. TODO: increase the limit. */
>>  -#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
>>  +/* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
>>  + * to max number of vCPUS in guest. Also, we are making sure here
>>  + * queue memory allocation do not fail.
> 
> It's not queue memory allocation anymore, is it?
> I would say "
> This also helps the tfiles field fit in 4K, so the whole tun
> device only needs an order-1 allocation.
> "
> 
>>  + */
>>  +#define MAX_TAP_QUEUES 256
>>   #define MAX_TAP_FLOWS  4096
>>   
>>   #define TUN_FLOW_EXPIRE (3 * HZ)
>>  -- 
>>  1.8.3.1
>>  
>>  --
>>  To unsubscribe from this list: send the line "unsubscribe netdev" in
>>  the body of a message to majordomo@vger.kernel.org
>>  More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 net-next 2/2 tuntap: Increase the number of queues in tun.
  2014-12-04  2:55     ` Jason Wang
@ 2014-12-04 10:20       ` Michael S. Tsirkin
  2014-12-04 10:42         ` Pankaj Gupta
  2014-12-05  7:35         ` Jason Wang
  0 siblings, 2 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2014-12-04 10:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: Pankaj Gupta, linux-kernel, netdev, davem, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei.shtylyov

On Thu, Dec 04, 2014 at 03:03:34AM +0008, Jason Wang wrote:
> 
> 
> On Wed, Dec 3, 2014 at 5:52 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> >On Wed, Dec 03, 2014 at 12:49:37PM +0530, Pankaj Gupta wrote:
> >> Networking under kvm works best if we allocate a per-vCPU RX and TX
> >> queue in a virtual NIC. This requires a per-vCPU queue on the host
> >>side.
> >> It is now safe to increase the maximum number of queues.
> >> Preceding patche: 'net: allow large number of rx queues'
> >
> >s/patche/patch/
> >
> >> made sure this won't cause failures due to high order memory
> >> allocations. Increase it to 256: this is the max number of vCPUs
> >> KVM supports.
> >> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> >> Reviewed-by: David Gibson <dgibson@redhat.com>
> >
> >Hmm it's kind of nasty that each tun device is now using x16 memory.
> >Maybe we should look at using a flex array instead, and removing the
> >limitation altogether (e.g. make it INT_MAX)?
> 
> But this only happens when IFF_MULTIQUEUE were used.

I refer to this field:
        struct tun_file __rcu   *tfiles[MAX_TAP_QUEUES];
if we make MAX_TAP_QUEUES 256, this will use 4K bytes,
apparently unconditionally.


> And core has vmalloc() fallback.
> So probably not a big issue?
> >
> >
> >
> >> ---
> >>  drivers/net/tun.c | 9 +++++----
> >>  1 file changed, 5 insertions(+), 4 deletions(-)
> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >> index e3fa65a..a19dc5f8 100644
> >> --- a/drivers/net/tun.c
> >> +++ b/drivers/net/tun.c
> >> @@ -113,10 +113,11 @@ struct tap_filter {
> >>  	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
> >>  };
> >>    -/* DEFAULT_MAX_NUM_RSS_QUEUES were chosen to let the rx/tx queues
> >>allocated for
> >> - * the netdevice to be fit in one page. So we can make sure the
> >>success of
> >> - * memory allocation. TODO: increase the limit. */
> >> -#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
> >> +/* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
> >> + * to max number of vCPUS in guest. Also, we are making sure here
> >> + * queue memory allocation do not fail.
> >
> >It's not queue memory allocation anymore, is it?
> >I would say "
> >This also helps the tfiles field fit in 4K, so the whole tun
> >device only needs an order-1 allocation.
> >"
> >
> >> + */
> >> +#define MAX_TAP_QUEUES 256
> >>  #define MAX_TAP_FLOWS  4096
> >>  #define TUN_FLOW_EXPIRE (3 * HZ)
> >> --  1.8.3.1
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 net-next 2/2 tuntap: Increase the number of queues in tun.
  2014-12-04 10:20       ` Michael S. Tsirkin
@ 2014-12-04 10:42         ` Pankaj Gupta
  2014-12-05  7:35         ` Jason Wang
  1 sibling, 0 replies; 11+ messages in thread
From: Pankaj Gupta @ 2014-12-04 10:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, linux-kernel, netdev, davem, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei shtylyov


> 
> On Thu, Dec 04, 2014 at 03:03:34AM +0008, Jason Wang wrote:
> > 
> > 
> > On Wed, Dec 3, 2014 at 5:52 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >On Wed, Dec 03, 2014 at 12:49:37PM +0530, Pankaj Gupta wrote:
> > >> Networking under kvm works best if we allocate a per-vCPU RX and TX
> > >> queue in a virtual NIC. This requires a per-vCPU queue on the host
> > >>side.
> > >> It is now safe to increase the maximum number of queues.
> > >> Preceding patche: 'net: allow large number of rx queues'
> > >
> > >s/patche/patch/
> > >
> > >> made sure this won't cause failures due to high order memory
> > >> allocations. Increase it to 256: this is the max number of vCPUs
> > >> KVM supports.
> > >> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > >> Reviewed-by: David Gibson <dgibson@redhat.com>
> > >
> > >Hmm it's kind of nasty that each tun device is now using x16 memory.
> > >Maybe we should look at using a flex array instead, and removing the
> > >limitation altogether (e.g. make it INT_MAX)?
> > 
> > But this only happens when IFF_MULTIQUEUE were used.
> 
> I refer to this field:
>         struct tun_file __rcu   *tfiles[MAX_TAP_QUEUES];
> if we make MAX_TAP_QUEUES 256, this will use 4K bytes,
> apparently unconditionally.

Are you saying use flow array for tfiles in-place of array of tun_file
pointer and grow dynamically when/if needed?

If yes, I agree it will be all order-1 allocation but it will add some level
of indirection as pointed by DaveM for flow, this time for tfiles. But yes, dynamically
allocating flex array as per usage will help to minimise memory pressure which in this
case is high, 256.

> 
> 
> > And core has vmalloc() fallback.
> > So probably not a big issue?
> > >
> > >
> > >
> > >> ---
> > >>  drivers/net/tun.c | 9 +++++----
> > >>  1 file changed, 5 insertions(+), 4 deletions(-)
> > >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > >> index e3fa65a..a19dc5f8 100644
> > >> --- a/drivers/net/tun.c
> > >> +++ b/drivers/net/tun.c
> > >> @@ -113,10 +113,11 @@ struct tap_filter {
> > >>  	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
> > >>  };
> > >>    -/* DEFAULT_MAX_NUM_RSS_QUEUES were chosen to let the rx/tx queues
> > >>allocated for
> > >> - * the netdevice to be fit in one page. So we can make sure the
> > >>success of
> > >> - * memory allocation. TODO: increase the limit. */
> > >> -#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
> > >> +/* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
> > >> + * to max number of vCPUS in guest. Also, we are making sure here
> > >> + * queue memory allocation do not fail.
> > >
> > >It's not queue memory allocation anymore, is it?
> > >I would say "
> > >This also helps the tfiles field fit in 4K, so the whole tun
> > >device only needs an order-1 allocation.
> > >"
> > >
> > >> + */
> > >> +#define MAX_TAP_QUEUES 256
> > >>  #define MAX_TAP_FLOWS  4096
> > >>  #define TUN_FLOW_EXPIRE (3 * HZ)
> > >> --  1.8.3.1
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> > >> the body of a message to majordomo@vger.kernel.org
> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 net-next 1/2] net: allow large number of rx queues
  2014-12-03  9:42   ` Michael S. Tsirkin
@ 2014-12-04 10:45     ` Pankaj Gupta
  0 siblings, 0 replies; 11+ messages in thread
From: Pankaj Gupta @ 2014-12-04 10:45 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, netdev, davem, jasowang, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei shtylyov


> 
> On Wed, Dec 03, 2014 at 12:49:36PM +0530, Pankaj Gupta wrote:
> > netif_alloc_rx_queues() uses kcalloc() to allocate memory
> > for "struct netdev_queue *_rx" array.
> > If we are doing large rx queue allocation kcalloc() might
> > fail, so this patch does a fallback to vzalloc().
> > Similar implementation is done for tx queue allocation in
> > netif_alloc_netdev_queues().
> > 
> > We avoid failure of high order memory allocation
> > with the help of vzalloc(), this allows us to do large
> > rx and tx queue allocation which in turn helps us to
> > increase the number of queues in tun.
> > 
> > As vmalloc() adds overhead on a critical network path,
> > __GFP_REPEAT flag is used with kzalloc() to do this fallback
> > only when really needed.
> > 
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > Reviewed-by: David Gibson <dgibson@redhat.com>
> > ---
> >  net/core/dev.c | 19 +++++++++++++------
> >  1 file changed, 13 insertions(+), 6 deletions(-)
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index e916ba8..abe9560 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -6059,17 +6059,25 @@ void netif_stacked_transfer_operstate(const struct
> > net_device *rootdev,
> >  EXPORT_SYMBOL(netif_stacked_transfer_operstate);
> >  
> >  #ifdef CONFIG_SYSFS
> > +static void netif_free_rx_queues(struct net_device *dev)
> > +{
> > +	kvfree(dev->_rx);
> > +}
> > +
> 
> I would just open-code this.

I will make the changes with the next version.
Thanks,
Pankaj
> 
> >  static int netif_alloc_rx_queues(struct net_device *dev)
> >  {
> >  	unsigned int i, count = dev->num_rx_queues;
> >  	struct netdev_rx_queue *rx;
> > +	size_t sz = count * sizeof(*rx);
> >  
> >  	BUG_ON(count < 1);
> >  
> > -	rx = kcalloc(count, sizeof(struct netdev_rx_queue), GFP_KERNEL);
> > -	if (!rx)
> > -		return -ENOMEM;
> > -
> > +	rx = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> > +	if (!rx) {
> > +		rx = vzalloc(sz);
> > +		if (!rx)
> > +			return -ENOMEM;
> > +	}
> >  	dev->_rx = rx;
> >  
> >  	for (i = 0; i < count; i++)
> > @@ -6698,9 +6706,8 @@ void free_netdev(struct net_device *dev)
> >  
> >  	netif_free_tx_queues(dev);
> >  #ifdef CONFIG_SYSFS
> > -	kfree(dev->_rx);
> > +	netif_free_rx_queues(dev);
> >  #endif
> > -
> 
> and I think it's nicer with the empty line.
> 
> >  	kfree(rcu_dereference_protected(dev->ingress_queue, 1));
> >  
> >  	/* Flush device addresses */
> > --
> > 1.8.3.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 net-next 2/2 tuntap: Increase the number of queues in tun.
  2014-12-04 10:20       ` Michael S. Tsirkin
  2014-12-04 10:42         ` Pankaj Gupta
@ 2014-12-05  7:35         ` Jason Wang
  2014-12-10  7:56           ` Pankaj Gupta
  1 sibling, 1 reply; 11+ messages in thread
From: Jason Wang @ 2014-12-05  7:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Pankaj Gupta, linux-kernel, netdev, davem, dgibson, vfalico,
	edumazet, vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings,
	xii, stephen, jiri, sergei.shtylyov


On 12/04/2014 06:20 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 04, 2014 at 03:03:34AM +0008, Jason Wang wrote:
>> > 
>> > 
>> > On Wed, Dec 3, 2014 at 5:52 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> > >On Wed, Dec 03, 2014 at 12:49:37PM +0530, Pankaj Gupta wrote:
>>>> > >> Networking under kvm works best if we allocate a per-vCPU RX and TX
>>>> > >> queue in a virtual NIC. This requires a per-vCPU queue on the host
>>>> > >>side.
>>>> > >> It is now safe to increase the maximum number of queues.
>>>> > >> Preceding patche: 'net: allow large number of rx queues'
>>> > >
>>> > >s/patche/patch/
>>> > >
>>>> > >> made sure this won't cause failures due to high order memory
>>>> > >> allocations. Increase it to 256: this is the max number of vCPUs
>>>> > >> KVM supports.
>>>> > >> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
>>>> > >> Reviewed-by: David Gibson <dgibson@redhat.com>
>>> > >
>>> > >Hmm it's kind of nasty that each tun device is now using x16 memory.
>>> > >Maybe we should look at using a flex array instead, and removing the
>>> > >limitation altogether (e.g. make it INT_MAX)?
>> > 
>> > But this only happens when IFF_MULTIQUEUE were used.
> I refer to this field:
>         struct tun_file __rcu   *tfiles[MAX_TAP_QUEUES];
> if we make MAX_TAP_QUEUES 256, this will use 4K bytes,
> apparently unconditionally.
>
>

How about just allocate one tfile if IFF_MULTIQUEUE were disabled?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 net-next 2/2 tuntap: Increase the number of queues in tun.
  2014-12-05  7:35         ` Jason Wang
@ 2014-12-10  7:56           ` Pankaj Gupta
  0 siblings, 0 replies; 11+ messages in thread
From: Pankaj Gupta @ 2014-12-10  7:56 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: linux-kernel, netdev, davem, dgibson, vfalico, edumazet,
	vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings, xii,
	stephen, jiri, sergei shtylyov


> 
> On 12/04/2014 06:20 PM, Michael S. Tsirkin wrote:
> > On Thu, Dec 04, 2014 at 03:03:34AM +0008, Jason Wang wrote:
> >> > 
> >> > 
> >> > On Wed, Dec 3, 2014 at 5:52 PM, Michael S. Tsirkin <mst@redhat.com>
> >> > wrote:
> >>> > >On Wed, Dec 03, 2014 at 12:49:37PM +0530, Pankaj Gupta wrote:
> >>>> > >> Networking under kvm works best if we allocate a per-vCPU RX and TX
> >>>> > >> queue in a virtual NIC. This requires a per-vCPU queue on the host
> >>>> > >>side.
> >>>> > >> It is now safe to increase the maximum number of queues.
> >>>> > >> Preceding patche: 'net: allow large number of rx queues'
> >>> > >
> >>> > >s/patche/patch/
> >>> > >
> >>>> > >> made sure this won't cause failures due to high order memory
> >>>> > >> allocations. Increase it to 256: this is the max number of vCPUs
> >>>> > >> KVM supports.
> >>>> > >> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> >>>> > >> Reviewed-by: David Gibson <dgibson@redhat.com>
> >>> > >
> >>> > >Hmm it's kind of nasty that each tun device is now using x16 memory.
> >>> > >Maybe we should look at using a flex array instead, and removing the
> >>> > >limitation altogether (e.g. make it INT_MAX)?
> >> > 
> >> > But this only happens when IFF_MULTIQUEUE were used.
> > I refer to this field:
> >         struct tun_file __rcu   *tfiles[MAX_TAP_QUEUES];
> > if we make MAX_TAP_QUEUES 256, this will use 4K bytes,
> > apparently unconditionally.
> >
> >
> 
> How about just allocate one tfile if IFF_MULTIQUEUE were disabled?

Yes, we can also go with one tfile if for single queue. 

As tfiles is array of 'tun_file' pointers with size 256. For multiqueue
we would be using 256 queues. But for single queue if we have one tfile
makes sense.

Also, we are making sure to avoid memory failures with vzalloc.

> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-12-10  7:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-03  7:19 [PATCH v3 net-net 0/2] Increase the limit of tuntap queues Pankaj Gupta
2014-12-03  7:19 ` [PATCH v3 net-next 1/2] net: allow large number of rx queues Pankaj Gupta
2014-12-03  9:42   ` Michael S. Tsirkin
2014-12-04 10:45     ` Pankaj Gupta
2014-12-03  7:19 ` [PATCH v3 net-next 2/2 tuntap: Increase the number of queues in tun Pankaj Gupta
2014-12-03  9:52   ` Michael S. Tsirkin
2014-12-04  2:55     ` Jason Wang
2014-12-04 10:20       ` Michael S. Tsirkin
2014-12-04 10:42         ` Pankaj Gupta
2014-12-05  7:35         ` Jason Wang
2014-12-10  7:56           ` Pankaj Gupta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.