All of lore.kernel.org
 help / color / mirror / Atom feed
* surprising memory request
@ 2013-01-18 16:58 Dirk Hohndel
  2013-01-18 17:46 ` Eric Dumazet
  2013-01-18 18:09 ` Waskiewicz Jr, Peter P
  0 siblings, 2 replies; 12+ messages in thread
From: Dirk Hohndel @ 2013-01-18 16:58 UTC (permalink / raw)
  To: netdev; +Cc: David Woodhouse


Running openconnect on a very recent 3.8 (a few commits before Linus cut
RC4) I get this allocation failure. I'm unclear why we would need 128
contiguous pages here...

/D

[66015.673818] openconnect: page allocation failure: order:7, mode:0x10c0d0
[66015.673827] Pid: 3292, comm: openconnect Tainted: G        W    3.8.0-rc3-00352-gdfdebc2 #94
[66015.673830] Call Trace:
[66015.673841]  [<ffffffff810e9c29>] warn_alloc_failed+0xe9/0x140
[66015.673849]  [<ffffffff81093967>] ? on_each_cpu_mask+0x87/0xa0
[66015.673854]  [<ffffffff810ec349>] __alloc_pages_nodemask+0x579/0x720
[66015.673859]  [<ffffffff810ec507>] __get_free_pages+0x17/0x50
[66015.673866]  [<ffffffff81123979>] kmalloc_order_trace+0x39/0xf0
[66015.673874]  [<ffffffff81666178>] ? __hw_addr_add_ex+0x78/0xc0
[66015.673879]  [<ffffffff811260d8>] __kmalloc+0xc8/0x180
[66015.673883]  [<ffffffff81666616>] ? dev_addr_init+0x66/0x90
[66015.673889]  [<ffffffff81660985>] alloc_netdev_mqs+0x145/0x300
[66015.673896]  [<ffffffff81513830>] ? tun_net_fix_features+0x20/0x20
[66015.673902]  [<ffffffff815168aa>] __tun_chr_ioctl+0xd0a/0xec0
[66015.673908]  [<ffffffff81516a93>] tun_chr_ioctl+0x13/0x20
[66015.673913]  [<ffffffff8113b197>] do_vfs_ioctl+0x97/0x530
[66015.673917]  [<ffffffff811256f3>] ? kmem_cache_free+0x33/0x170
[66015.673923]  [<ffffffff81134896>] ? final_putname+0x26/0x50
[66015.673927]  [<ffffffff8113b6c1>] sys_ioctl+0x91/0xb0
[66015.673935]  [<ffffffff8180e3d2>] system_call_fastpath+0x16/0x1b
[66015.673938] Mem-Info:
[66015.673940] DMA per-cpu:
[66015.673943] CPU    0: hi:    0, btch:   1 usd:   0
[66015.673945] CPU    1: hi:    0, btch:   1 usd:   0
[66015.673947] CPU    2: hi:    0, btch:   1 usd:   0
[66015.673949] CPU    3: hi:    0, btch:   1 usd:   0
[66015.673951] DMA32 per-cpu:
[66015.673953] CPU    0: hi:  186, btch:  31 usd:   0
[66015.673956] CPU    1: hi:  186, btch:  31 usd:  42
[66015.673958] CPU    2: hi:  186, btch:  31 usd:   0
[66015.673960] CPU    3: hi:  186, btch:  31 usd:   0
[66015.673962] Normal per-cpu:
[66015.673964] CPU    0: hi:  186, btch:  31 usd:   0
[66015.673966] CPU    1: hi:  186, btch:  31 usd:  46
[66015.673968] CPU    2: hi:  186, btch:  31 usd:   0
[66015.673970] CPU    3: hi:  186, btch:  31 usd:   0
[66015.673976] active_anon:1241168 inactive_anon:243402 isolated_anon:0
[66015.673976]  active_file:171470 inactive_file:184344 isolated_file:0
[66015.673976]  unevictable:41 dirty:2 writeback:0 unstable:0
[66015.673976]  free:84294 slab_reclaimable:45897 slab_unreclaimable:8765
[66015.673976]  mapped:37635 shmem:185852 pagetables:18316 bounce:0
[66015.673976]  free_cma:0
[66015.673987] DMA free:15900kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15644kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[66015.673989] lowmem_reserve[]: 0 2138 7924 7924
[66015.674001] DMA32 free:308944kB min:3068kB low:3832kB high:4600kB active_anon:789936kB inactive_anon:270492kB active_file:302520kB inactive_file:346788kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2190060kB managed:2136000kB mlocked:0kB dirty:0kB writeback:0kB mapped:36180kB shmem:227796kB slab_reclaimable:111048kB slab_unreclaimable:2632kB kernel_stack:376kB pagetables:6084kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:21 all_unreclaimable? no
[66015.674004] lowmem_reserve[]: 0 0 5786 5786
[66015.674014] Normal free:12332kB min:8308kB low:10384kB high:12460kB active_anon:4174736kB inactive_anon:703116kB active_file:383360kB inactive_file:390588kB unevictable:164kB isolated(anon):0kB isolated(file):0kB present:5925024kB managed:5874208kB mlocked:164kB dirty:8kB writeback:0kB mapped:114360kB shmem:515612kB slab_reclaimable:72540kB slab_unreclaimable:32428kB kernel_stack:4264kB pagetables:67180kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:46 all_unreclaimable? no
[66015.674016] lowmem_reserve[]: 0 0 0 0
[66015.674021] DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 2*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15900kB
[66015.674040] DMA32: 4730*4kB (UEM) 14656*8kB (UEM) 6007*16kB (UEM) 1577*32kB (UEMR) 269*64kB (UEMR) 68*128kB (UEMR) 2*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 309176kB
[66015.674057] Normal: 2772*4kB (UEM) 44*8kB (UEM) 17*16kB (UEM) 6*32kB (R) 1*64kB (R) 0*128kB 1*256kB (R) 1*512kB (R) 0*1024kB 0*2048kB 0*4096kB = 12736kB
[66015.674074] 541777 total pagecache pages
[66015.674075] 142 pages in swap cache
[66015.674078] Swap cache stats: add 4420, delete 4278, find 106/118
[66015.674080] Free swap  = 9197216kB
[66015.674082] Total swap = 9213948kB
[66015.710738] 2094576 pages RAM
[66015.710745] 85536 pages reserved
[66015.710746] 1755663 pages shared
[66015.710748] 1488520 pages non-shared
[66015.710752] netdev: Unable to allocate 1024 tx queues


-- 
Dirk Hohndel
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: surprising memory request
  2013-01-18 16:58 surprising memory request Dirk Hohndel
@ 2013-01-18 17:46 ` Eric Dumazet
  2013-01-18 17:52   ` Stephen Hemminger
                     ` (3 more replies)
  2013-01-18 18:09 ` Waskiewicz Jr, Peter P
  1 sibling, 4 replies; 12+ messages in thread
From: Eric Dumazet @ 2013-01-18 17:46 UTC (permalink / raw)
  To: Dirk Hohndel, Jason Wang; +Cc: netdev, David Woodhouse

On Fri, 2013-01-18 at 08:58 -0800, Dirk Hohndel wrote:
> Running openconnect on a very recent 3.8 (a few commits before Linus cut
> RC4) I get this allocation failure. I'm unclear why we would need 128
> contiguous pages here...
> 
> /D
> 
> [66015.673818] openconnect: page allocation failure: order:7, mode:0x10c0d0
> [66015.673827] Pid: 3292, comm: openconnect Tainted: G        W    3.8.0-rc3-00352-gdfdebc2 #94
> [66015.673830] Call Trace:
> [66015.673841]  [<ffffffff810e9c29>] warn_alloc_failed+0xe9/0x140
> [66015.673849]  [<ffffffff81093967>] ? on_each_cpu_mask+0x87/0xa0
> [66015.673854]  [<ffffffff810ec349>] __alloc_pages_nodemask+0x579/0x720
> [66015.673859]  [<ffffffff810ec507>] __get_free_pages+0x17/0x50
> [66015.673866]  [<ffffffff81123979>] kmalloc_order_trace+0x39/0xf0
> [66015.673874]  [<ffffffff81666178>] ? __hw_addr_add_ex+0x78/0xc0
> [66015.673879]  [<ffffffff811260d8>] __kmalloc+0xc8/0x180
> [66015.673883]  [<ffffffff81666616>] ? dev_addr_init+0x66/0x90
> [66015.673889]  [<ffffffff81660985>] alloc_netdev_mqs+0x145/0x300
> [66015.673896]  [<ffffffff81513830>] ? tun_net_fix_features+0x20/0x20
> [66015.673902]  [<ffffffff815168aa>] __tun_chr_ioctl+0xd0a/0xec0
> [66015.673908]  [<ffffffff81516a93>] tun_chr_ioctl+0x13/0x20
> [66015.673913]  [<ffffffff8113b197>] do_vfs_ioctl+0x97/0x530
> [66015.673917]  [<ffffffff811256f3>] ? kmem_cache_free+0x33/0x170
> [66015.673923]  [<ffffffff81134896>] ? final_putname+0x26/0x50
> [66015.673927]  [<ffffffff8113b6c1>] sys_ioctl+0x91/0xb0
> [66015.673935]  [<ffffffff8180e3d2>] system_call_fastpath+0x16/0x1b
> [66015.673938] Mem-Info:

Thats because Jason thought that tun device had to have an insane number
of queues to get good performance.

#define MAX_TAP_QUEUES 1024

Thats crazy if your machine has say 8 cpus.

And Jason didnt care to adapt the memory allocations done in
alloc_netdev_mqs(), in order to switch to vmalloc() when kmalloc()
fails.

commit c8d68e6be1c3b242f1c598595830890b65cea64a
Author: Jason Wang <jasowang@redhat.com>
Date:   Wed Oct 31 19:46:00 2012 +0000

    tuntap: multiqueue support
    
    This patch converts tun/tap to a multiqueue devices and expose the multiqueue
    queues as multiple file descriptors to userspace. Internally, each tun_file were
    abstracted as a queue, and an array of pointers to tun_file structurs were
    stored in tun_structure device, so multiple tun_files were allowed to be
    attached to the device as multiple queues.
    
    When choosing txq, we first try to identify a flow through its rxhash, if it
    does not have such one, we could try recorded rxq and then use them to choose
    the transmit queue. This policy may be changed in the future.
    
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: surprising memory request
  2013-01-18 17:46 ` Eric Dumazet
@ 2013-01-18 17:52   ` Stephen Hemminger
  2013-01-21  5:44     ` Jason Wang
  2013-01-18 17:54   ` Eric Dumazet
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2013-01-18 17:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Dirk Hohndel, Jason Wang, netdev, David Woodhouse

On Fri, 18 Jan 2013 09:46:30 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Fri, 2013-01-18 at 08:58 -0800, Dirk Hohndel wrote:
> > Running openconnect on a very recent 3.8 (a few commits before Linus cut
> > RC4) I get this allocation failure. I'm unclear why we would need 128
> > contiguous pages here...
> > 
> > /D
> > 
> > [66015.673818] openconnect: page allocation failure: order:7, mode:0x10c0d0
> > [66015.673827] Pid: 3292, comm: openconnect Tainted: G        W    3.8.0-rc3-00352-gdfdebc2 #94
> > [66015.673830] Call Trace:
> > [66015.673841]  [<ffffffff810e9c29>] warn_alloc_failed+0xe9/0x140
> > [66015.673849]  [<ffffffff81093967>] ? on_each_cpu_mask+0x87/0xa0
> > [66015.673854]  [<ffffffff810ec349>] __alloc_pages_nodemask+0x579/0x720
> > [66015.673859]  [<ffffffff810ec507>] __get_free_pages+0x17/0x50
> > [66015.673866]  [<ffffffff81123979>] kmalloc_order_trace+0x39/0xf0
> > [66015.673874]  [<ffffffff81666178>] ? __hw_addr_add_ex+0x78/0xc0
> > [66015.673879]  [<ffffffff811260d8>] __kmalloc+0xc8/0x180
> > [66015.673883]  [<ffffffff81666616>] ? dev_addr_init+0x66/0x90
> > [66015.673889]  [<ffffffff81660985>] alloc_netdev_mqs+0x145/0x300
> > [66015.673896]  [<ffffffff81513830>] ? tun_net_fix_features+0x20/0x20
> > [66015.673902]  [<ffffffff815168aa>] __tun_chr_ioctl+0xd0a/0xec0
> > [66015.673908]  [<ffffffff81516a93>] tun_chr_ioctl+0x13/0x20
> > [66015.673913]  [<ffffffff8113b197>] do_vfs_ioctl+0x97/0x530
> > [66015.673917]  [<ffffffff811256f3>] ? kmem_cache_free+0x33/0x170
> > [66015.673923]  [<ffffffff81134896>] ? final_putname+0x26/0x50
> > [66015.673927]  [<ffffffff8113b6c1>] sys_ioctl+0x91/0xb0
> > [66015.673935]  [<ffffffff8180e3d2>] system_call_fastpath+0x16/0x1b
> > [66015.673938] Mem-Info:
> 
> Thats because Jason thought that tun device had to have an insane number
> of queues to get good performance.
> 
> #define MAX_TAP_QUEUES 1024
> 
> Thats crazy if your machine has say 8 cpus.
> 
> And Jason didnt care to adapt the memory allocations done in
> alloc_netdev_mqs(), in order to switch to vmalloc() when kmalloc()
> fails.
> 
> commit c8d68e6be1c3b242f1c598595830890b65cea64a
> Author: Jason Wang <jasowang@redhat.com>
> Date:   Wed Oct 31 19:46:00 2012 +0000
> 
>     tuntap: multiqueue support
>     
>     This patch converts tun/tap to a multiqueue devices and expose the multiqueue
>     queues as multiple file descriptors to userspace. Internally, each tun_file were
>     abstracted as a queue, and an array of pointers to tun_file structurs were
>     stored in tun_structure device, so multiple tun_files were allowed to be
>     attached to the device as multiple queues.
>     
>     When choosing txq, we first try to identify a flow through its rxhash, if it
>     does not have such one, we could try recorded rxq and then use them to choose
>     the transmit queue. This policy may be changed in the future.
>     
>     Signed-off-by: Jason Wang <jasowang@redhat.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

Also the tuntap device now has it's own flow cache which is also a bad idea.
Why not just 128 queues and a hash like SFQ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: surprising memory request
  2013-01-18 17:46 ` Eric Dumazet
  2013-01-18 17:52   ` Stephen Hemminger
@ 2013-01-18 17:54   ` Eric Dumazet
  2013-01-21  5:21     ` Jason Wang
  2013-01-18 17:59   ` David Woodhouse
  2013-01-21  5:13   ` Jason Wang
  3 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2013-01-18 17:54 UTC (permalink / raw)
  To: Dirk Hohndel; +Cc: Jason Wang, netdev, David Woodhouse

On Fri, 2013-01-18 at 09:46 -0800, Eric Dumazet wrote:

> Thats because Jason thought that tun device had to have an insane number
> of queues to get good performance.
> 
> #define MAX_TAP_QUEUES 1024
> 
> Thats crazy if your machine has say 8 cpus.
> 
> And Jason didnt care to adapt the memory allocations done in
> alloc_netdev_mqs(), in order to switch to vmalloc() when kmalloc()
> fails.

I suggest using the more reasonable :

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index c81680d..ec18fbf 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -113,7 +113,7 @@ struct tap_filter {
  * the order of 100-200 CPUs so this leaves us some breathing space if we want
  * to match a queue per guest CPU.
  */
-#define MAX_TAP_QUEUES 1024
+#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
 
 #define TUN_FLOW_EXPIRE (3 * HZ)
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: surprising memory request
  2013-01-18 17:46 ` Eric Dumazet
  2013-01-18 17:52   ` Stephen Hemminger
  2013-01-18 17:54   ` Eric Dumazet
@ 2013-01-18 17:59   ` David Woodhouse
  2013-01-20 13:06     ` Ben Hutchings
  2013-01-21  5:23     ` Jason Wang
  2013-01-21  5:13   ` Jason Wang
  3 siblings, 2 replies; 12+ messages in thread
From: David Woodhouse @ 2013-01-18 17:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Dirk Hohndel, Jason Wang, netdev

[-- Attachment #1: Type: text/plain, Size: 294 bytes --]

On Fri, 2013-01-18 at 09:46 -0800, Eric Dumazet wrote:
> 
> #define MAX_TAP_QUEUES 1024
> 
> Thats crazy if your machine has say 8 cpus.

Even crazier if your userspace is never going to *use* MQ. Can't we
default to one queue unless userspace explicitly requests more?

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6171 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: surprising memory request
  2013-01-18 16:58 surprising memory request Dirk Hohndel
  2013-01-18 17:46 ` Eric Dumazet
@ 2013-01-18 18:09 ` Waskiewicz Jr, Peter P
  2013-01-18 18:11   ` Waskiewicz Jr, Peter P
  1 sibling, 1 reply; 12+ messages in thread
From: Waskiewicz Jr, Peter P @ 2013-01-18 18:09 UTC (permalink / raw)
  To: Dirk Hohndel; +Cc: netdev, David Woodhouse

On Fri, Jan 18, 2013 at 08:58:18AM -0800, Dirk Hohndel wrote:
> 
> Running openconnect on a very recent 3.8 (a few commits before Linus cut
> RC4) I get this allocation failure. I'm unclear why we would need 128
> contiguous pages here...
> 
> /D

[...]

> [66015.674074] 541777 total pagecache pages
> [66015.674075] 142 pages in swap cache
> [66015.674078] Swap cache stats: add 4420, delete 4278, find 106/118
> [66015.674080] Free swap  = 9197216kB
> [66015.674082] Total swap = 9213948kB
> [66015.710738] 2094576 pages RAM
> [66015.710745] 85536 pages reserved
> [66015.710746] 1755663 pages shared
> [66015.710748] 1488520 pages non-shared
> [66015.710752] netdev: Unable to allocate 1024 tx queues

What device are you using that is trying to allocate so many Tx queues?
Assuming this is coming from the VPN device coming online; is SELinux
in enforcing mode?  There have been a number of changes in the tun
driver recently around the multiqueue area with SELinux, just a theory
at this point.  I can keep digging.

Cheers,
-PJ

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: surprising memory request
  2013-01-18 18:09 ` Waskiewicz Jr, Peter P
@ 2013-01-18 18:11   ` Waskiewicz Jr, Peter P
  0 siblings, 0 replies; 12+ messages in thread
From: Waskiewicz Jr, Peter P @ 2013-01-18 18:11 UTC (permalink / raw)
  To: Dirk Hohndel; +Cc: netdev, David Woodhouse

On Fri, Jan 18, 2013 at 10:09:30AM -0800, Waskiewicz Jr, Peter P wrote:
> On Fri, Jan 18, 2013 at 08:58:18AM -0800, Dirk Hohndel wrote:
> > 
> > Running openconnect on a very recent 3.8 (a few commits before Linus cut
> > RC4) I get this allocation failure. I'm unclear why we would need 128
> > contiguous pages here...
> > 
> > /D
> 
> [...]
> 
> > [66015.674074] 541777 total pagecache pages
> > [66015.674075] 142 pages in swap cache
> > [66015.674078] Swap cache stats: add 4420, delete 4278, find 106/118
> > [66015.674080] Free swap  = 9197216kB
> > [66015.674082] Total swap = 9213948kB
> > [66015.710738] 2094576 pages RAM
> > [66015.710745] 85536 pages reserved
> > [66015.710746] 1755663 pages shared
> > [66015.710748] 1488520 pages non-shared
> > [66015.710752] netdev: Unable to allocate 1024 tx queues
> 
> What device are you using that is trying to allocate so many Tx queues?
> Assuming this is coming from the VPN device coming online; is SELinux
> in enforcing mode?  There have been a number of changes in the tun
> driver recently around the multiqueue area with SELinux, just a theory
> at this point.  I can keep digging.

Or I can see Eric's mail before replying, nevermind.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: surprising memory request
  2013-01-18 17:59   ` David Woodhouse
@ 2013-01-20 13:06     ` Ben Hutchings
  2013-01-21  5:23     ` Jason Wang
  1 sibling, 0 replies; 12+ messages in thread
From: Ben Hutchings @ 2013-01-20 13:06 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Eric Dumazet, Dirk Hohndel, Jason Wang, netdev

On Fri, 2013-01-18 at 17:59 +0000, David Woodhouse wrote:
> On Fri, 2013-01-18 at 09:46 -0800, Eric Dumazet wrote:
> > 
> > #define MAX_TAP_QUEUES 1024
> > 
> > Thats crazy if your machine has say 8 cpus.
> 
> Even crazier if your userspace is never going to *use* MQ. Can't we
> default to one queue unless userspace explicitly requests more?

I don't know about tun's internal TX queue structures but the core TX
queue structures immediately follow struct net_device and you have to
set a maximum when allocating it.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: surprising memory request
  2013-01-18 17:46 ` Eric Dumazet
                     ` (2 preceding siblings ...)
  2013-01-18 17:59   ` David Woodhouse
@ 2013-01-21  5:13   ` Jason Wang
  3 siblings, 0 replies; 12+ messages in thread
From: Jason Wang @ 2013-01-21  5:13 UTC (permalink / raw)
  To: Eric Dumazet, Dirk Hohndel; +Cc: netdev, David Woodhouse

On Friday, January 18, 2013 09:46:30 AM Eric Dumazet wrote:
> On Fri, 2013-01-18 at 08:58 -0800, Dirk Hohndel wrote:
> > Running openconnect on a very recent 3.8 (a few commits before Linus cut
> > RC4) I get this allocation failure. I'm unclear why we would need 128
> > contiguous pages here...
> > 
> > /D
> > 
> > [66015.673818] openconnect: page allocation failure: order:7,
> > mode:0x10c0d0
> > [66015.673827] Pid: 3292, comm: openconnect Tainted: G        W   
> > 3.8.0-rc3-00352-gdfdebc2 #94 [66015.673830] Call Trace:
> > [66015.673841]  [<ffffffff810e9c29>] warn_alloc_failed+0xe9/0x140
> > [66015.673849]  [<ffffffff81093967>] ? on_each_cpu_mask+0x87/0xa0
> > [66015.673854]  [<ffffffff810ec349>] __alloc_pages_nodemask+0x579/0x720
> > [66015.673859]  [<ffffffff810ec507>] __get_free_pages+0x17/0x50
> > [66015.673866]  [<ffffffff81123979>] kmalloc_order_trace+0x39/0xf0
> > [66015.673874]  [<ffffffff81666178>] ? __hw_addr_add_ex+0x78/0xc0
> > [66015.673879]  [<ffffffff811260d8>] __kmalloc+0xc8/0x180
> > [66015.673883]  [<ffffffff81666616>] ? dev_addr_init+0x66/0x90
> > [66015.673889]  [<ffffffff81660985>] alloc_netdev_mqs+0x145/0x300
> > [66015.673896]  [<ffffffff81513830>] ? tun_net_fix_features+0x20/0x20
> > [66015.673902]  [<ffffffff815168aa>] __tun_chr_ioctl+0xd0a/0xec0
> > [66015.673908]  [<ffffffff81516a93>] tun_chr_ioctl+0x13/0x20
> > [66015.673913]  [<ffffffff8113b197>] do_vfs_ioctl+0x97/0x530
> > [66015.673917]  [<ffffffff811256f3>] ? kmem_cache_free+0x33/0x170
> > [66015.673923]  [<ffffffff81134896>] ? final_putname+0x26/0x50
> > [66015.673927]  [<ffffffff8113b6c1>] sys_ioctl+0x91/0xb0
> > [66015.673935]  [<ffffffff8180e3d2>] system_call_fastpath+0x16/0x1b
> 
> > [66015.673938] Mem-Info:
> Thats because Jason thought that tun device had to have an insane number
> of queues to get good performance.
> 
> #define MAX_TAP_QUEUES 1024
> 
> Thats crazy if your machine has say 8 cpus.
> 
> And Jason didnt care to adapt the memory allocations done in
> alloc_netdev_mqs(), in order to switch to vmalloc() when kmalloc()
> fails.
> 

Right, looks like the alloc_netdev_mqs uses kmalloc to require a huge number of contigious pages used by netdev_queue and netdev_rx_queue. Most of them are not needed when MQ is not enabled.I wonder whether we can solve the contigious page allocation in alloc_netdev_mqs() by using flex array instead of kmalloc() to allocate rx/tx queues.

Dirk, could you pls try the following patch that only allocate one queue when IFF_MULTI_QUEUE is not specified.

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 3f011e0..734085e 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1589,6 +1589,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	else {
 		char *name;
 		unsigned long flags = 0;
+		int queues = ifr->ifr_flags & IFF_MULTI_QUEUE ?
+			     MAX_TAP_QUEUES : 1;
 
 		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
@@ -1612,8 +1614,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 			name = ifr->ifr_name;
 
 		dev = alloc_netdev_mqs(sizeof(struct tun_struct), name,
-				       tun_setup,
-				       MAX_TAP_QUEUES, MAX_TAP_QUEUES);
+				       tun_setup, queues, queues);
 		if (!dev)
 			return -ENOMEM;

> commit c8d68e6be1c3b242f1c598595830890b65cea64a
> Author: Jason Wang <jasowang@redhat.com>
> Date:   Wed Oct 31 19:46:00 2012 +0000
> 
>     tuntap: multiqueue support
> 
>     This patch converts tun/tap to a multiqueue devices and expose the
> multiqueue queues as multiple file descriptors to userspace. Internally,
> each tun_file were abstracted as a queue, and an array of pointers to
> tun_file structurs were stored in tun_structure device, so multiple
> tun_files were allowed to be attached to the device as multiple queues.
> 
>     When choosing txq, we first try to identify a flow through its rxhash,
> if it does not have such one, we could try recorded rxq and then use them
> to choose the transmit queue. This policy may be changed in the future.
> 
>     Signed-off-by: Jason Wang <jasowang@redhat.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: surprising memory request
  2013-01-18 17:54   ` Eric Dumazet
@ 2013-01-21  5:21     ` Jason Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Wang @ 2013-01-21  5:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Dirk Hohndel, netdev, David Woodhouse

On 01/19/2013 01:54 AM, Eric Dumazet wrote:
> On Fri, 2013-01-18 at 09:46 -0800, Eric Dumazet wrote:
>
>> Thats because Jason thought that tun device had to have an insane number
>> of queues to get good performance.
>>
>> #define MAX_TAP_QUEUES 1024
>>
>> Thats crazy if your machine has say 8 cpus.
>>
>> And Jason didnt care to adapt the memory allocations done in
>> alloc_netdev_mqs(), in order to switch to vmalloc() when kmalloc()
>> fails.
> I suggest using the more reasonable :
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index c81680d..ec18fbf 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -113,7 +113,7 @@ struct tap_filter {
>   * the order of 100-200 CPUs so this leaves us some breathing space if we want
>   * to match a queue per guest CPU.
>   */
> -#define MAX_TAP_QUEUES 1024
> +#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
>  
>  #define TUN_FLOW_EXPIRE (3 * HZ)
>  

But it's default value 8 is a little too small, we can easily have a kvm
guest with more than 8 vcpus and a host multiqueue card with more than 8
queues. Maybe we can use num_possible_cpus() or just an arbitrary number
such as 256 which seems large enough.


>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: surprising memory request
  2013-01-18 17:59   ` David Woodhouse
  2013-01-20 13:06     ` Ben Hutchings
@ 2013-01-21  5:23     ` Jason Wang
  1 sibling, 0 replies; 12+ messages in thread
From: Jason Wang @ 2013-01-21  5:23 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Eric Dumazet, Dirk Hohndel, netdev

On 01/19/2013 01:59 AM, David Woodhouse wrote:
> On Fri, 2013-01-18 at 09:46 -0800, Eric Dumazet wrote:
>> #define MAX_TAP_QUEUES 1024
>>
>> Thats crazy if your machine has say 8 cpus.
> Even crazier if your userspace is never going to *use* MQ. Can't we
> default to one queue unless userspace explicitly requests more?
>

Yes, if userspace never use MQ, we just use a one queue netdevice. Just
draft a patch fro Dirk to test.

Thanks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: surprising memory request
  2013-01-18 17:52   ` Stephen Hemminger
@ 2013-01-21  5:44     ` Jason Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Wang @ 2013-01-21  5:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, Dirk Hohndel, netdev, David Woodhouse

On 01/19/2013 01:52 AM, Stephen Hemminger wrote:
> On Fri, 18 Jan 2013 09:46:30 -0800
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> On Fri, 2013-01-18 at 08:58 -0800, Dirk Hohndel wrote:
>>> Running openconnect on a very recent 3.8 (a few commits before Linus cut
>>> RC4) I get this allocation failure. I'm unclear why we would need 128
>>> contiguous pages here...
>>>
>>> /D
>>>
>>> [66015.673818] openconnect: page allocation failure: order:7, mode:0x10c0d0
>>> [66015.673827] Pid: 3292, comm: openconnect Tainted: G        W    3.8.0-rc3-00352-gdfdebc2 #94
>>> [66015.673830] Call Trace:
>>> [66015.673841]  [<ffffffff810e9c29>] warn_alloc_failed+0xe9/0x140
>>> [66015.673849]  [<ffffffff81093967>] ? on_each_cpu_mask+0x87/0xa0
>>> [66015.673854]  [<ffffffff810ec349>] __alloc_pages_nodemask+0x579/0x720
>>> [66015.673859]  [<ffffffff810ec507>] __get_free_pages+0x17/0x50
>>> [66015.673866]  [<ffffffff81123979>] kmalloc_order_trace+0x39/0xf0
>>> [66015.673874]  [<ffffffff81666178>] ? __hw_addr_add_ex+0x78/0xc0
>>> [66015.673879]  [<ffffffff811260d8>] __kmalloc+0xc8/0x180
>>> [66015.673883]  [<ffffffff81666616>] ? dev_addr_init+0x66/0x90
>>> [66015.673889]  [<ffffffff81660985>] alloc_netdev_mqs+0x145/0x300
>>> [66015.673896]  [<ffffffff81513830>] ? tun_net_fix_features+0x20/0x20
>>> [66015.673902]  [<ffffffff815168aa>] __tun_chr_ioctl+0xd0a/0xec0
>>> [66015.673908]  [<ffffffff81516a93>] tun_chr_ioctl+0x13/0x20
>>> [66015.673913]  [<ffffffff8113b197>] do_vfs_ioctl+0x97/0x530
>>> [66015.673917]  [<ffffffff811256f3>] ? kmem_cache_free+0x33/0x170
>>> [66015.673923]  [<ffffffff81134896>] ? final_putname+0x26/0x50
>>> [66015.673927]  [<ffffffff8113b6c1>] sys_ioctl+0x91/0xb0
>>> [66015.673935]  [<ffffffff8180e3d2>] system_call_fastpath+0x16/0x1b
>>> [66015.673938] Mem-Info:
>> Thats because Jason thought that tun device had to have an insane number
>> of queues to get good performance.
>>
>> #define MAX_TAP_QUEUES 1024
>>
>> Thats crazy if your machine has say 8 cpus.
>>
>> And Jason didnt care to adapt the memory allocations done in
>> alloc_netdev_mqs(), in order to switch to vmalloc() when kmalloc()
>> fails.
>>
>> commit c8d68e6be1c3b242f1c598595830890b65cea64a
>> Author: Jason Wang <jasowang@redhat.com>
>> Date:   Wed Oct 31 19:46:00 2012 +0000
>>
>>     tuntap: multiqueue support
>>     
>>     This patch converts tun/tap to a multiqueue devices and expose the multiqueue
>>     queues as multiple file descriptors to userspace. Internally, each tun_file were
>>     abstracted as a queue, and an array of pointers to tun_file structurs were
>>     stored in tun_structure device, so multiple tun_files were allowed to be
>>     attached to the device as multiple queues.
>>     
>>     When choosing txq, we first try to identify a flow through its rxhash, if it
>>     does not have such one, we could try recorded rxq and then use them to choose
>>     the transmit queue. This policy may be changed in the future.
>>     
>>     Signed-off-by: Jason Wang <jasowang@redhat.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
> Also the tuntap device now has it's own flow cache which is also a bad idea.
> Why not just 128 queues and a hash like SFQ?

Hi Stephen:

I know your concerns, I think we can solve it by limiting the number of
flow caches to a value (say 4096). With this, the average worst
searching depth is 4 which solves the issue when there's lots of
short-live connections.

The issue of just an array of 128 entries is that the matching is not
accurate. With an array of limited entries, we can easily get the index
collision with two different flows, which may result the packets of a
flow move back of forth between queues. Ideally we may need a perfect
filter and doing comparison on n-tuple which may be very expensive for
software device such as tun, so I choose to store rxhash in the flow
caches and using a hash list to do the match.

Thanks


>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-01-21  5:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-18 16:58 surprising memory request Dirk Hohndel
2013-01-18 17:46 ` Eric Dumazet
2013-01-18 17:52   ` Stephen Hemminger
2013-01-21  5:44     ` Jason Wang
2013-01-18 17:54   ` Eric Dumazet
2013-01-21  5:21     ` Jason Wang
2013-01-18 17:59   ` David Woodhouse
2013-01-20 13:06     ` Ben Hutchings
2013-01-21  5:23     ` Jason Wang
2013-01-21  5:13   ` Jason Wang
2013-01-18 18:09 ` Waskiewicz Jr, Peter P
2013-01-18 18:11   ` Waskiewicz Jr, Peter P

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.