All of lore.kernel.org
 help / color / mirror / Atom feed
* 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
@ 2012-03-11 18:32 ` Marc MERLIN
  2012-03-29  5:31   ` Marc MERLIN
  0 siblings, 1 reply; 72+ messages in thread
From: Marc MERLIN @ 2012-03-11 18:32 UTC (permalink / raw)
  To: Linux Kernel mailing List

I still need some help narrowing this bug down so that I can give a
better report.

First, the problem is not new to 3.2.8, it appeared after I upgraded
my laptop to a new distribution and switched from 32bits to 64bit kernel
and userland. This in turn seems to be causing memory allocation problems.

When I make a big NFS copy, apparently interrupts don't get serviced
after a while, and my X screen won't update itself (clock) and my mouse
cursor won't move.
Then, after maybe a minute or more, it usually recovers for a few
seconds, and then locks again. The copy usually finishes eventually
if I leave the laptop for a few hours on its own while the UI is
unresponsive.
I've had the problem with both wired and wireless copies, and tried
doing a copy with smbmount instead of nfs, and the issue was similar.

I first had a vol prempt kernel, and tried recompiling with preempt and
that did not help.
My kernel config is here: http://marc.merlins.org/tmp/config.txt

Last night, I started a big copy, and this morning found the copy half
hung. The laptop was responsive, but while I could ping, TCP connections
would put the process in unkillable kernel hung state.

Below are some sysreq dumps I took (syslog to local disk was still
working fine). I know I have Tainted 'G', and I have no idea where that
came from, sorry :-/

It looks like I may run out of some kind of memory which in turn is
deadlocking some drivers?
(actual total memory is fine, user apps do not get OOM'ed and 'free' looked fine)

I ran the relevant sysrq commands wihch are at 
http://marc.merlins.org/tmp/sysrq.txt
(too big to paste here) 

But lots of tasks are hung like so:
[28451.191115] WorkerPool/1248 D ffff88013bc93580     0 12483   3740 0x00000080
[28451.191115]  ffff8801189ba100 0000000000000082 0000000000000000 ffff880134f2e180
[28451.191115]  0000000000013580 ffff88001614bfd8 ffff88001614bfd8 ffff8801189ba100
[28451.191115]  ffffffff811b4b62 000000010164525a 0000000000000046 ffffffff8165a250
[28451.191115] Call Trace:
[28451.191115]  [<ffffffff811b4b62>] ? sha_transform+0x395/0x1209
[28451.191115]  [<ffffffff8134a9b4>] ? __mutex_lock_common.isra.6+0x13d/0x219
[28451.191115]  [<ffffffff81242714>] ? extract_buf+0x86/0xf2
[28451.191115]  [<ffffffff8134a7e6>] ? mutex_lock+0xf/0x1f
[28451.191115]  [<ffffffff81298979>] ? rtnetlink_rcv+0xe/0x28
[28451.191115]  [<ffffffff812ad007>] ? netlink_unicast+0xe6/0x14e
[28451.191115]  [<ffffffff812ad26b>] ? netlink_sendmsg+0x1fc/0x237
[28451.191115]  [<ffffffff8127c770>] ? sock_sendmsg+0xc1/0xde
[28451.191115]  [<ffffffff810eca23>] ? __cache_free.isra.40+0x19/0x1a7
[28451.191115]  [<ffffffff813496be>] ? nl_pid_hash_rehash+0xc8/0xef
[28451.191115]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
[28451.191115]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
[28451.191115]  [<ffffffff8134e1d2>] ? sub_preempt_count+0x83/0x94
[28451.191115]  [<ffffffff810fd81e>] ? fget_light+0x85/0x8d
[28451.191115]  [<ffffffff8127e0e3>] ? sys_sendto+0xf7/0x137
[28451.191115]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
[28451.191115]  [<ffffffff8134e1d2>] ? sub_preempt_count+0x83/0x94
[28451.191115]  [<ffffffff8134b725>] ? _raw_spin_unlock+0x24/0x30
[28451.191115]  [<ffffffff8108d73e>] ? audit_syscall_entry+0x105/0x130
[28451.191115]  [<ffffffff8134fd52>] ? system_call_fastpath+0x16/0x1b



Below are lines I got in syslog during the copy.
Highlight is:
[ 4437.367046] kworker/1:1: page allocation failure: order:1, mode:0x20
and then:
[ 8640.516177] INFO: task flush-0:37:7122 blocked for more than 120 seconds.
and then 120,000 lines(!) of:
[ 9654.042164] ieee80211 phy0: failed to reallocate TX buffer

unedited lines below.

So, any idea of what I can try next?

Thanks,
Marc


[ 4437.367046] kworker/1:1: page allocation failure: order:1, mode:0x20
[ 4437.367053] Pid: 8067, comm: kworker/1:1 Tainted: G           O 3.2.8-amd64-volpreempt-noide-20120208 #1
[ 4437.367056] Call Trace:
[ 4437.367058]  <IRQ>  [<ffffffff810b9ec0>] ? warn_alloc_failed+0x11f/0x132
[ 4437.367074]  [<ffffffff810bcdaa>] ? __alloc_pages_nodemask+0x6b1/0x72f
[ 4437.367081]  [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
[ 4437.367086]  [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
[ 4437.367090]  [<ffffffff810edd21>] ? fallback_alloc+0x123/0x1c2
[ 4437.367096]  [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
[ 4437.367101]  [<ffffffff810ee215>] ? __kmalloc+0xb2/0x10a
[ 4437.367105]  [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
[ 4437.367139]  [<ffffffffa03e22c1>] ? ieee80211_skb_resize+0x64/0x9d [mac80211]
[ 4437.367154]  [<ffffffffa03e4252>] ? ieee80211_subif_start_xmit+0x705/0x883 [mac80211]
[ 4437.367175]  [<ffffffff8128e767>] ? dev_hard_start_xmit+0x40b/0x552
[ 4437.367179]  [<ffffffff812a4adc>] ? sch_direct_xmit+0x63/0x13a
[ 4437.367182]  [<ffffffff8128eb8e>] ? dev_queue_xmit+0x2e0/0x4b5
[ 4437.367185]  [<ffffffff812b764d>] ? ip_finish_output2+0x1c7/0x218
[ 4437.367188]  [<ffffffff812b86aa>] ? __ip_flush_pending_frames.isra.29+0x69/0x69
[ 4437.367191]  [<ffffffff812b8a6a>] ? ip_queue_xmit+0x2cd/0x30d
[ 4437.367195]  [<ffffffff81066be9>] ? getnstimeofday+0x4a/0x7b
[ 4437.367198]  [<ffffffff812ca1d2>] ? tcp_transmit_skb+0x6d7/0x70a
[ 4437.367201]  [<ffffffff812cac5f>] ? tcp_write_xmit+0x698/0x7a1
[ 4437.367204]  [<ffffffff812c77bf>] ? tcp_ack+0x14e3/0x1658
[ 4437.367207]  [<ffffffff812c89bd>] ? tcp_established_options+0x2b/0x9e
[ 4437.367210]  [<ffffffff812cada9>] ? __tcp_push_pending_frames+0x18/0x44
[ 4437.367213]  [<ffffffff812c4e27>] ? tcp_data_snd_check+0x2c/0xfd
[ 4437.367216]  [<ffffffff812c86c5>] ? tcp_rcv_established+0x4f0/0x549
[ 4437.367220]  [<ffffffff8103ec39>] ? select_task_rq_fair+0x67b/0x690
[ 4437.367223]  [<ffffffff812ce735>] ? tcp_v4_do_rcv+0x166/0x323
[ 4437.367226]  [<ffffffff812cfdce>] ? tcp_v4_rcv+0x404/0x65d
[ 4437.367230]  [<ffffffff812b4d55>] ? ip_local_deliver_finish+0x148/0x1ba
[ 4437.367233]  [<ffffffff8128cfa4>] ? __netif_receive_skb+0x3f2/0x43f
[ 4437.367236]  [<ffffffff8128d31d>] ? netif_receive_skb+0x7e/0x84
[ 4437.367239]  [<ffffffff8128d7dd>] ? napi_gro_receive+0x1c/0x29
[ 4437.367241]  [<ffffffff8128d398>] ? napi_skb_finish+0x1c/0x31
[ 4437.367253]  [<ffffffffa026bde3>] ? e1000_clean_rx_irq+0x1f3/0x290 [e1000e]
[ 4437.367261]  [<ffffffffa026c26c>] ? e1000_clean+0x69/0x208 [e1000e]
[ 4437.367264]  [<ffffffff8128d8fb>] ? net_rx_action+0xa4/0x1c0
[ 4437.367268]  [<ffffffff8104c581>] ? __do_softirq+0xc0/0x188
[ 4437.367272]  [<ffffffff81351fac>] ? call_softirq+0x1c/0x30
[ 4437.367276]  [<ffffffff8100f98d>] ? do_softirq+0x3c/0x7b
[ 4437.367278]  [<ffffffff8104c87c>] ? irq_exit+0x3d/0xa7
[ 4437.367281]  [<ffffffff8100f6b4>] ? do_IRQ+0x81/0x97
[ 4437.367285]  [<ffffffff8134ba2e>] ? common_interrupt+0x6e/0x6e
[ 4437.367287]  <EOI>  [<ffffffffa008b32c>] ? dec128+0x434/0x80c [aes_x86_64]
[ 4437.367307]  [<ffffffffa0085164>] ? crypt+0xae/0x101 [xts]
[ 4437.367313]  [<ffffffffa008b712>] ? aes_decrypt+0xe/0xe [aes_x86_64]
[ 4437.367320]  [<ffffffffa008b704>] ? dec128+0x80c/0x80c [aes_x86_64]
[ 4437.367327]  [<ffffffffa00851f6>] ? decrypt+0x3f/0x44 [xts]
[ 4437.367331]  [<ffffffff8118cdb3>] ? async_decrypt+0x37/0x3c
[ 4437.367338]  [<ffffffffa0105e2a>] ? crypt_convert+0x22f/0x2c4 [dm_crypt]
[ 4437.367342]  [<ffffffff8100d02f>] ? load_TLS+0x7/0xa
[ 4437.367348]  [<ffffffffa01061b8>] ? kcryptd_crypt+0x56/0x342 [dm_crypt]
[ 4437.367352]  [<ffffffff81038cd2>] ? finish_task_switch+0x86/0xb7
[ 4437.367355]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
[ 4437.367358]  [<ffffffff8134e1d2>] ? sub_preempt_count+0x83/0x94
[ 4437.367361]  [<ffffffff8103612b>] ? need_resched+0x1a/0x23
[ 4437.367368]  [<ffffffffa0106162>] ? crypt_convert_init.isra.14+0x4f/0x4f [dm_crypt]
[ 4437.367372]  [<ffffffff8105b867>] ? process_one_work+0x16d/0x298
[ 4437.367375]  [<ffffffff8105c84a>] ? worker_thread+0xc2/0x145
[ 4437.367378]  [<ffffffff8105c788>] ? manage_workers.isra.23+0x15b/0x15b
[ 4437.367381]  [<ffffffff8105f9fe>] ? kthread+0x76/0x7e
[ 4437.367384]  [<ffffffff81351eb4>] ? kernel_thread_helper+0x4/0x10
[ 4437.367387]  [<ffffffff8105f988>] ? kthread_worker_fn+0x139/0x139
[ 4437.367390]  [<ffffffff81351eb0>] ? gs_change+0x13/0x13
[ 4437.367392] Mem-Info:
[ 4437.367393] Node 0 DMA per-cpu:
[ 4437.367396] CPU    0: hi:    0, btch:   1 usd:   0
[ 4437.367397] CPU    1: hi:    0, btch:   1 usd:   0
[ 4437.367399] Node 0 DMA32 per-cpu:
[ 4437.367401] CPU    0: hi:  186, btch:  31 usd: 164
[ 4437.367403] CPU    1: hi:  186, btch:  31 usd: 111
[ 4437.367405] Node 0 Normal per-cpu:
[ 4437.367407] CPU    0: hi:  186, btch:  31 usd: 114
[ 4437.367409] CPU    1: hi:  186, btch:  31 usd: 158
[ 4437.367413] active_anon:391300 inactive_anon:132951 isolated_anon:0
[ 4437.367414]  active_file:136666 inactive_file:140710 isolated_file:31
[ 4437.367415]  unevictable:1 dirty:3402 writeback:26688 unstable:7844
[ 4437.367416]  free:36509 slab_reclaimable:85289 slab_unreclaimable:35524
[ 4437.367417]  mapped:18088 shmem:35934 pagetables:9300 bounce:0
[ 4437.367419] Node 0 DMA free:15712kB min:260kB low:324kB high:388kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:36kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:36kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:160kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:40833 all_unreclaimable? yes
[ 4437.367428] lowmem_reserve[]: 0 2960 3907 3907
[ 4437.367432] Node 0 DMA32 free:110732kB min:51004kB low:63752kB high:76504kB active_anon:1380396kB inactive_anon:345140kB active_file:422008kB inactive_file:437440kB unevictable:4kB isolated(anon):0kB isolated(file):124kB present:3031688kB mlocked:4kB dirty:7148kB writeback:72004kB mapped:39424kB shmem:64836kB slab_reclaimable:212408kB slab_unreclaimable:80516kB kernel_stack:1720kB pagetables:19252kB unstable:23964kB bounce:0kB writeback_tmp:0kB pages_scanned:63 all_unreclaimable? no
[ 4437.367442] lowmem_reserve[]: 0 0 946 946
[ 4437.367445] Node 0 Normal free:19592kB min:16312kB low:20388kB high:24468kB active_anon:184804kB inactive_anon:186664kB active_file:124656kB inactive_file:125364kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:969600kB mlocked:0kB dirty:6460kB writeback:34712kB mapped:32928kB shmem:78900kB slab_reclaimable:128748kB slab_unreclaimable:61420kB kernel_stack:2792kB pagetables:17948kB unstable:7412kB bounce:0kB writeback_tmp:0kB pages_scanned:89 all_unreclaimable? no
[ 4437.367455] lowmem_reserve[]: 0 0 0 0
[ 4437.367458] Node 0 DMA: 2*4kB 1*8kB 1*16kB 0*32kB 1*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 2*4096kB = 15712kB
[ 4437.367467] Node 0 DMA32: 25961*4kB 73*8kB 8*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 110732kB
[ 4437.367475] Node 0 Normal: 4134*4kB 0*8kB 1*16kB 1*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 19656kB
[ 4437.367484] 317456 total pagecache pages
[ 4437.367485] 4042 pages in swap cache
[ 4437.367487] Swap cache stats: add 31786, delete 27744, find 10282/11070
[ 4437.367489] Free swap  = 4012560kB
[ 4437.367490] Total swap = 4106248kB
[ 4437.370978] 1032176 pages RAM
[ 4437.370978] 42834 pages reserved
[ 4437.370978] 390787 pages shared
[ 4437.370978] 750687 pages non-shared


[ 8640.516177] INFO: task flush-0:37:7122 blocked for more than 120 seconds.
[ 8640.516182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8640.516186] flush-0:37      D ffff88013bc93580     0  7122      2 0x00000080
[ 8640.516192]  ffff880072c28810 0000000000000046 ffff880100000000 ffff880134f2e180
[ 8640.516199]  0000000000013580 ffff88006d491fd8 ffff88006d491fd8 ffff880072c28810
[ 8640.516205]  ffff88013bfd1c50 000000018134b58b ffff88010c3cc1b0 ffff88006d491d18
[ 8640.516211] Call Trace:
[ 8640.516221]  [<ffffffff8110e81a>] ? inode_owner_or_capable+0x36/0x36
[ 8640.516226]  [<ffffffff8110e820>] ? inode_wait+0x6/0xa
[ 8640.516232]  [<ffffffff8134a72c>] ? __wait_on_bit+0x3e/0x71
[ 8640.516241]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
[ 8640.516245]  [<ffffffff81119674>] ? inode_wait_for_writeback+0xa2/0xc8
[ 8640.516249]  [<ffffffff810600c9>] ? autoremove_wake_function+0x2a/0x2a
[ 8640.516252]  [<ffffffff8111b4b4>] ? wb_writeback+0x226/0x255
[ 8640.516255]  [<ffffffff8134e27d>] ? add_preempt_count+0x9a/0x9c
[ 8640.516258]  [<ffffffff8111b8d4>] ? wb_do_writeback+0x150/0x1b2
[ 8640.516261]  [<ffffffff8111b9c5>] ? bdi_writeback_thread+0x8f/0x204
[ 8640.516264]  [<ffffffff8111b936>] ? wb_do_writeback+0x1b2/0x1b2
[ 8640.516266]  [<ffffffff8105f9fe>] ? kthread+0x76/0x7e
[ 8640.516270]  [<ffffffff81351eb4>] ? kernel_thread_helper+0x4/0x10
[ 8640.516273]  [<ffffffff8105f988>] ? kthread_worker_fn+0x139/0x139
[ 8640.516275]  [<ffffffff81351eb0>] ? gs_change+0x13/0x13
[ 8640.516281] INFO: task cp:7568 blocked for more than 120 seconds.
[ 8640.516283] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8640.516284] cp              D ffff88013bc13580     0  7568   6744 0x00000080
[ 8640.516288]  ffff880123976750 0000000000000082 0000000000000000 ffffffff8160d020
[ 8640.516292]  0000000000013580 ffff88001b3a9fd8 ffff88001b3a9fd8 ffff880123976750
[ 8640.516295]  0000000000000001 0000000181066767 ffff880131463e50 ffff88013bc13e08
[ 8640.516299] Call Trace:
[ 8640.516303]  [<ffffffff810b5d03>] ? __lock_page+0x66/0x66
[ 8640.516306]  [<ffffffff8134a2ec>] ? io_schedule+0x58/0x6f
[ 8640.516308]  [<ffffffff810b5d09>] ? sleep_on_page+0x6/0xa
[ 8640.516311]  [<ffffffff8134a72c>] ? __wait_on_bit+0x3e/0x71
[ 8640.516313]  [<ffffffff810b5e51>] ? wait_on_page_bit+0x6e/0x73
[ 8640.516316]  [<ffffffff810600c9>] ? autoremove_wake_function+0x2a/0x2a
[ 8640.516319]  [<ffffffff810b5f29>] ? filemap_fdatawait_range+0x74/0x139
[ 8640.516327]  [<ffffffff8111acab>] ? writeback_single_inode+0x155/0x2f4
[ 8640.516330]  [<ffffffff8111ae94>] ? sync_inode+0x4a/0x6f
[ 8640.516343]  [<ffffffffa06b9b02>] ? nfs_wb_all+0x39/0x3e [nfs]
[ 8640.516351]  [<ffffffffa06aeed1>] ? nfs_setattr+0x8e/0xf6 [nfs]
[ 8640.516354]  [<ffffffff811104c3>] ? notify_change+0x177/0x24f
[ 8640.516357]  [<ffffffff8111e85c>] ? utimes_common+0x10c/0x135
[ 8640.516361]  [<ffffffff810fd55a>] ? fget+0x50/0x57
[ 8640.516364]  [<ffffffff8111e90f>] ? do_utimes+0x8a/0xd6
[ 8640.516367]  [<ffffffff810fc7a2>] ? vfs_read+0x9f/0xe6
[ 8640.516369]  [<ffffffff8111ea24>] ? sys_utimensat+0x64/0x6b
[ 8640.516372]  [<ffffffff8134fd52>] ? system_call_fastpath+0x16/0x1b


[ 9654.042164] ieee80211 phy0: failed to reallocate TX buffer
[ 9654.042189] ieee80211 phy0: failed to reallocate TX buffer
(120,000 lines of this)

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-03-11 18:32 ` Marc MERLIN
@ 2012-03-29  5:31   ` Marc MERLIN
  2012-03-29  6:49     ` David Miller
  2012-03-29 16:41       ` Marc MERLIN
  0 siblings, 2 replies; 72+ messages in thread
From: Marc MERLIN @ 2012-03-29  5:31 UTC (permalink / raw)
  To: Linux Kernel mailing List

I upgraded the laptop from 4GB to 8GB and I still cannot to a big
copy via NFS without the laptop hanging pretty badly.

Doing the same copy with rsync over ssh seems to have worked.

This is a pretty bad failure. Someone please tell me there is something I can do 
to work around the problem, and get the underlying problem fixed.

Thanks,
Marc

I tried the copy over wifi this time, cna got:
mc: page allocation failure: order:1, mode:0x20
Pid: 7099, comm: mc Tainted: G        W  O 3.2.8-amd64-volpreempt-noide-20120208 #1
Call Trace:
 <IRQ>  [<ffffffff810b9ec0>] ? warn_alloc_failed+0x11f/0x132
 [<ffffffff810bcdaa>] ? __alloc_pages_nodemask+0x6b1/0x72f
 [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
 [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
 [<ffffffff810edd21>] ? fallback_alloc+0x123/0x1c2
 [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
 [<ffffffff810ee215>] ? __kmalloc+0xb2/0x10a
 [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
 [<ffffffffa096a2c1>] ? ieee80211_skb_resize+0x64/0x9d [mac80211]
 [<ffffffffa096c252>] ? ieee80211_subif_start_xmit+0x705/0x883 [mac80211]
 [<ffffffff81036108>] ? test_tsk_need_resched+0xe/0x17
 [<ffffffff8128e767>] ? dev_hard_start_xmit+0x40b/0x552
 [<ffffffff8104c8ec>] ? raise_softirq_irqoff+0x6/0x27
 [<ffffffff812a4adc>] ? sch_direct_xmit+0x63/0x13a
 [<ffffffff8128eb8e>] ? dev_queue_xmit+0x2e0/0x4b5
 [<ffffffff812b764d>] ? ip_finish_output2+0x1c7/0x218
 [<ffffffff812b86aa>] ? __ip_flush_pending_frames.isra.29+0x69/0x69
 [<ffffffff812b8a6a>] ? ip_queue_xmit+0x2cd/0x30d
 [<ffffffff81066be9>] ? getnstimeofday+0x4a/0x7b
 [<ffffffff812ca1d2>] ? tcp_transmit_skb+0x6d7/0x70a
 [<ffffffff812cac5f>] ? tcp_write_xmit+0x698/0x7a1
 [<ffffffff812c77bf>] ? tcp_ack+0x14e3/0x1658
 [<ffffffff812c89bd>] ? tcp_established_options+0x2b/0x9e
 [<ffffffff812cada9>] ? __tcp_push_pending_frames+0x18/0x44
 [<ffffffff812c4e27>] ? tcp_data_snd_check+0x2c/0xfd
 [<ffffffff812c86c5>] ? tcp_rcv_established+0x4f0/0x549
 [<ffffffff812ce735>] ? tcp_v4_do_rcv+0x166/0x323
 [<ffffffff812cfdce>] ? tcp_v4_rcv+0x404/0x65d
 [<ffffffff81036108>] ? test_tsk_need_resched+0xe/0x17
 [<ffffffff812b4d55>] ? ip_local_deliver_finish+0x148/0x1ba
 [<ffffffff8128cfa4>] ? __netif_receive_skb+0x3f2/0x43f
 [<ffffffff8128d31d>] ? netif_receive_skb+0x7e/0x84
 [<ffffffffa0966bd6>] ? ieee80211_deliver_skb+0xbb/0xf1 [mac80211]
 [<ffffffffa0967f32>] ? ieee80211_rx_handlers+0x1041/0x18a7 [mac80211]
 [<ffffffff810528d2>] ? lock_timer_base.isra.29+0x23/0x47
 [<ffffffff81071629>] ? arch_local_irq_save+0x11/0x17
 [<ffffffff8134b58b>] ? _raw_spin_lock_irqsave+0x1c/0x41
 [<ffffffffa0966730>] ? ieee80211_release_reorder_frame+0x35/0x4a [mac80211]
 [<ffffffffa0968fb2>] ? ieee80211_prepare_and_rx_handle+0x81a/0x872 [mac80211]
 [<ffffffffa09696e6>] ? ieee80211_rx+0x6dc/0x706 [mac80211]
 [<ffffffffa09a20e4>] ? iwlagn_rx_reply_rx+0x3c2/0x3dc [iwlwifi]
 [<ffffffff810398a1>] ? resched_task+0x48/0x6c
 [<ffffffffa09ab9f8>] ? iwl_irq_tasklet+0x446/0x6df [iwlwifi]
 [<ffffffff81039ff4>] ? check_preempt_curr+0x52/0x5f
 [<ffffffff8104c445>] ? tasklet_action+0x79/0xc8
 [<ffffffff8104c581>] ? __do_softirq+0xc0/0x188
 [<ffffffff81351fac>] ? call_softirq+0x1c/0x30
 [<ffffffff8100f98d>] ? do_softirq+0x3c/0x7b
 [<ffffffff8104c87c>] ? irq_exit+0x3d/0xa7
 [<ffffffff8100f6b4>] ? do_IRQ+0x81/0x97
 [<ffffffff8134ba2e>] ? common_interrupt+0x6e/0x6e
 <EOI>  [<ffffffff8134fe84>] ? sysret_audit+0x16/0x20
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd: 173
CPU    1: hi:  186, btch:  31 usd: 134
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 157
CPU    1: hi:  186, btch:  31 usd: 101
active_anon:1031319 inactive_anon:207860 isolated_anon:0
 active_file:234263 inactive_file:341759 isolated_file:0
 unevictable:9 dirty:21221 writeback:96084 unstable:129
 free:40765 slab_reclaimable:89393 slab_unreclaimable:19850
 mapped:29109 shmem:65088 pagetables:21560 bounce:0
Node 0 DMA free:15908kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 2960 7947 7947
Node 0 DMA32 free:94360kB min:25128kB low:31408kB high:37692kB active_anon:1122272kB inactive_anon:301128kB active_file:479144kB inactive_file:802228kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3031688kB mlocked:0kB dirty:25936kB writeback:193704kB mapped:41696kB shmem:78944kB slab_reclaimable:174676kB slab_unreclaimable:19700kB kernel_stack:3056kB pagetables:11056kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Afer a few of these, I just got, a loop of 'failed to reallocate TX buffer'
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:  32
CPU    1: hi:  186, btch:  31 usd:  20
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 132
CPU    1: hi:  186, btch:  31 usd: 174
active_anon:1025929 inactive_anon:209027 isolated_anon:17
 active_file:235971 inactive_file:313547 isolated_file:6
 unevictable:9 dirty:86311 writeback:91894 unstable:4359
 free:70274 slab_reclaimable:88825 slab_unreclaimable:21955
 mapped:28146 shmem:66271 pagetables:21542 bounce:0
Node 0 DMA free:15908kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 2960 7947 7947
Node 0 DMA32 free:212292kB min:25128kB low:31408kB high:37692kB active_anon:1110276kB inactive_anon: 304520kB active_file:479356kB inactive_file:691616kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3031688kB mlocked:0kB dirty:111500kB writeback:193764kB mapped:41348kB shmem:82336kB slab_reclaimable:173504kB slab_unreclaimable:23312kB kernel_stack:3056kB pagetables:11028kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 4986 4986
Node 0 Normal free:52896kB min:42324kB low:52904kB high:63484kB active_anon:2993440kB inactive_anon:531588kB active_file:464528kB inactive_file:562572kB unevictable:36kB isolated(anon):68kB isolated(file):24kB present:5106560kB mlocked:36kB dirty:233744kB writeback:173812kB mapped:71236kB shmem:182748kB slab_reclaimable:181796kB slab_unreclaimable:64508kB kernel_stack:3736kB pagetables:75140kB unstable:17436kB bounce:0kB writeback_tmp:0kB pages_scanned:98 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15908kB
Node 0 DMA32: 51901*4kB 1*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 212316kB
Node 0 Normal: 12200*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 52896kB
682814 total pagecache pages
67047 pages in swap cache
Swap cache stats: add 802793, delete 735746, find 537857/581722
Free swap  = 2660592kB
Total swap = 4106248kB
2080752 pages RAM
57174 pages reserved
842957 pages shared
1472720 pages non-shared
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer


On Sun, Mar 11, 2012 at 11:32:44AM -0700, Marc MERLIN wrote:
> I still need some help narrowing this bug down so that I can give a
> better report.
> 
> First, the problem is not new to 3.2.8, it appeared after I upgraded
> my laptop to a new distribution and switched from 32bits to 64bit kernel
> and userland. This in turn seems to be causing memory allocation problems.
> 
> When I make a big NFS copy, apparently interrupts don't get serviced
> after a while, and my X screen won't update itself (clock) and my mouse
> cursor won't move.
> Then, after maybe a minute or more, it usually recovers for a few
> seconds, and then locks again. The copy usually finishes eventually
> if I leave the laptop for a few hours on its own while the UI is
> unresponsive.
> I've had the problem with both wired and wireless copies, and tried
> doing a copy with smbmount instead of nfs, and the issue was similar.
> 
> I first had a vol prempt kernel, and tried recompiling with preempt and
> that did not help.
> My kernel config is here: http://marc.merlins.org/tmp/config.txt
> 
> Last night, I started a big copy, and this morning found the copy half
> hung. The laptop was responsive, but while I could ping, TCP connections
> would put the process in unkillable kernel hung state.
> 
> Below are some sysreq dumps I took (syslog to local disk was still
> working fine). I know I have Tainted 'G', and I have no idea where that
> came from, sorry :-/
> 
> It looks like I may run out of some kind of memory which in turn is
> deadlocking some drivers?
> (actual total memory is fine, user apps do not get OOM'ed and 'free' looked fine)
> 
> I ran the relevant sysrq commands wihch are at 
> http://marc.merlins.org/tmp/sysrq.txt
> (too big to paste here) 
> 
> But lots of tasks are hung like so:
> [28451.191115] WorkerPool/1248 D ffff88013bc93580     0 12483   3740 0x00000080
> [28451.191115]  ffff8801189ba100 0000000000000082 0000000000000000 ffff880134f2e180
> [28451.191115]  0000000000013580 ffff88001614bfd8 ffff88001614bfd8 ffff8801189ba100
> [28451.191115]  ffffffff811b4b62 000000010164525a 0000000000000046 ffffffff8165a250
> [28451.191115] Call Trace:
> [28451.191115]  [<ffffffff811b4b62>] ? sha_transform+0x395/0x1209
> [28451.191115]  [<ffffffff8134a9b4>] ? __mutex_lock_common.isra.6+0x13d/0x219
> [28451.191115]  [<ffffffff81242714>] ? extract_buf+0x86/0xf2
> [28451.191115]  [<ffffffff8134a7e6>] ? mutex_lock+0xf/0x1f
> [28451.191115]  [<ffffffff81298979>] ? rtnetlink_rcv+0xe/0x28
> [28451.191115]  [<ffffffff812ad007>] ? netlink_unicast+0xe6/0x14e
> [28451.191115]  [<ffffffff812ad26b>] ? netlink_sendmsg+0x1fc/0x237
> [28451.191115]  [<ffffffff8127c770>] ? sock_sendmsg+0xc1/0xde
> [28451.191115]  [<ffffffff810eca23>] ? __cache_free.isra.40+0x19/0x1a7
> [28451.191115]  [<ffffffff813496be>] ? nl_pid_hash_rehash+0xc8/0xef
> [28451.191115]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
> [28451.191115]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
> [28451.191115]  [<ffffffff8134e1d2>] ? sub_preempt_count+0x83/0x94
> [28451.191115]  [<ffffffff810fd81e>] ? fget_light+0x85/0x8d
> [28451.191115]  [<ffffffff8127e0e3>] ? sys_sendto+0xf7/0x137
> [28451.191115]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
> [28451.191115]  [<ffffffff8134e1d2>] ? sub_preempt_count+0x83/0x94
> [28451.191115]  [<ffffffff8134b725>] ? _raw_spin_unlock+0x24/0x30
> [28451.191115]  [<ffffffff8108d73e>] ? audit_syscall_entry+0x105/0x130
> [28451.191115]  [<ffffffff8134fd52>] ? system_call_fastpath+0x16/0x1b
> 
> 
> 
> Below are lines I got in syslog during the copy.
> Highlight is:
> [ 4437.367046] kworker/1:1: page allocation failure: order:1, mode:0x20
> and then:
> [ 8640.516177] INFO: task flush-0:37:7122 blocked for more than 120 seconds.
> and then 120,000 lines(!) of:
> [ 9654.042164] ieee80211 phy0: failed to reallocate TX buffer
> 
> unedited lines below.
> 
> So, any idea of what I can try next?
> 
> Thanks,
> Marc
> 
> 
> [ 4437.367046] kworker/1:1: page allocation failure: order:1, mode:0x20
> [ 4437.367053] Pid: 8067, comm: kworker/1:1 Tainted: G           O 3.2.8-amd64-volpreempt-noide-20120208 #1
> [ 4437.367056] Call Trace:
> [ 4437.367058]  <IRQ>  [<ffffffff810b9ec0>] ? warn_alloc_failed+0x11f/0x132
> [ 4437.367074]  [<ffffffff810bcdaa>] ? __alloc_pages_nodemask+0x6b1/0x72f
> [ 4437.367081]  [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
> [ 4437.367086]  [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
> [ 4437.367090]  [<ffffffff810edd21>] ? fallback_alloc+0x123/0x1c2
> [ 4437.367096]  [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
> [ 4437.367101]  [<ffffffff810ee215>] ? __kmalloc+0xb2/0x10a
> [ 4437.367105]  [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
> [ 4437.367139]  [<ffffffffa03e22c1>] ? ieee80211_skb_resize+0x64/0x9d [mac80211]
> [ 4437.367154]  [<ffffffffa03e4252>] ? ieee80211_subif_start_xmit+0x705/0x883 [mac80211]
> [ 4437.367175]  [<ffffffff8128e767>] ? dev_hard_start_xmit+0x40b/0x552
> [ 4437.367179]  [<ffffffff812a4adc>] ? sch_direct_xmit+0x63/0x13a
> [ 4437.367182]  [<ffffffff8128eb8e>] ? dev_queue_xmit+0x2e0/0x4b5
> [ 4437.367185]  [<ffffffff812b764d>] ? ip_finish_output2+0x1c7/0x218
> [ 4437.367188]  [<ffffffff812b86aa>] ? __ip_flush_pending_frames.isra.29+0x69/0x69
> [ 4437.367191]  [<ffffffff812b8a6a>] ? ip_queue_xmit+0x2cd/0x30d
> [ 4437.367195]  [<ffffffff81066be9>] ? getnstimeofday+0x4a/0x7b
> [ 4437.367198]  [<ffffffff812ca1d2>] ? tcp_transmit_skb+0x6d7/0x70a
> [ 4437.367201]  [<ffffffff812cac5f>] ? tcp_write_xmit+0x698/0x7a1
> [ 4437.367204]  [<ffffffff812c77bf>] ? tcp_ack+0x14e3/0x1658
> [ 4437.367207]  [<ffffffff812c89bd>] ? tcp_established_options+0x2b/0x9e
> [ 4437.367210]  [<ffffffff812cada9>] ? __tcp_push_pending_frames+0x18/0x44
> [ 4437.367213]  [<ffffffff812c4e27>] ? tcp_data_snd_check+0x2c/0xfd
> [ 4437.367216]  [<ffffffff812c86c5>] ? tcp_rcv_established+0x4f0/0x549
> [ 4437.367220]  [<ffffffff8103ec39>] ? select_task_rq_fair+0x67b/0x690
> [ 4437.367223]  [<ffffffff812ce735>] ? tcp_v4_do_rcv+0x166/0x323
> [ 4437.367226]  [<ffffffff812cfdce>] ? tcp_v4_rcv+0x404/0x65d
> [ 4437.367230]  [<ffffffff812b4d55>] ? ip_local_deliver_finish+0x148/0x1ba
> [ 4437.367233]  [<ffffffff8128cfa4>] ? __netif_receive_skb+0x3f2/0x43f
> [ 4437.367236]  [<ffffffff8128d31d>] ? netif_receive_skb+0x7e/0x84
> [ 4437.367239]  [<ffffffff8128d7dd>] ? napi_gro_receive+0x1c/0x29
> [ 4437.367241]  [<ffffffff8128d398>] ? napi_skb_finish+0x1c/0x31
> [ 4437.367253]  [<ffffffffa026bde3>] ? e1000_clean_rx_irq+0x1f3/0x290 [e1000e]
> [ 4437.367261]  [<ffffffffa026c26c>] ? e1000_clean+0x69/0x208 [e1000e]
> [ 4437.367264]  [<ffffffff8128d8fb>] ? net_rx_action+0xa4/0x1c0
> [ 4437.367268]  [<ffffffff8104c581>] ? __do_softirq+0xc0/0x188
> [ 4437.367272]  [<ffffffff81351fac>] ? call_softirq+0x1c/0x30
> [ 4437.367276]  [<ffffffff8100f98d>] ? do_softirq+0x3c/0x7b
> [ 4437.367278]  [<ffffffff8104c87c>] ? irq_exit+0x3d/0xa7
> [ 4437.367281]  [<ffffffff8100f6b4>] ? do_IRQ+0x81/0x97
> [ 4437.367285]  [<ffffffff8134ba2e>] ? common_interrupt+0x6e/0x6e
> [ 4437.367287]  <EOI>  [<ffffffffa008b32c>] ? dec128+0x434/0x80c [aes_x86_64]
> [ 4437.367307]  [<ffffffffa0085164>] ? crypt+0xae/0x101 [xts]
> [ 4437.367313]  [<ffffffffa008b712>] ? aes_decrypt+0xe/0xe [aes_x86_64]
> [ 4437.367320]  [<ffffffffa008b704>] ? dec128+0x80c/0x80c [aes_x86_64]
> [ 4437.367327]  [<ffffffffa00851f6>] ? decrypt+0x3f/0x44 [xts]
> [ 4437.367331]  [<ffffffff8118cdb3>] ? async_decrypt+0x37/0x3c
> [ 4437.367338]  [<ffffffffa0105e2a>] ? crypt_convert+0x22f/0x2c4 [dm_crypt]
> [ 4437.367342]  [<ffffffff8100d02f>] ? load_TLS+0x7/0xa
> [ 4437.367348]  [<ffffffffa01061b8>] ? kcryptd_crypt+0x56/0x342 [dm_crypt]
> [ 4437.367352]  [<ffffffff81038cd2>] ? finish_task_switch+0x86/0xb7
> [ 4437.367355]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
> [ 4437.367358]  [<ffffffff8134e1d2>] ? sub_preempt_count+0x83/0x94
> [ 4437.367361]  [<ffffffff8103612b>] ? need_resched+0x1a/0x23
> [ 4437.367368]  [<ffffffffa0106162>] ? crypt_convert_init.isra.14+0x4f/0x4f [dm_crypt]
> [ 4437.367372]  [<ffffffff8105b867>] ? process_one_work+0x16d/0x298
> [ 4437.367375]  [<ffffffff8105c84a>] ? worker_thread+0xc2/0x145
> [ 4437.367378]  [<ffffffff8105c788>] ? manage_workers.isra.23+0x15b/0x15b
> [ 4437.367381]  [<ffffffff8105f9fe>] ? kthread+0x76/0x7e
> [ 4437.367384]  [<ffffffff81351eb4>] ? kernel_thread_helper+0x4/0x10
> [ 4437.367387]  [<ffffffff8105f988>] ? kthread_worker_fn+0x139/0x139
> [ 4437.367390]  [<ffffffff81351eb0>] ? gs_change+0x13/0x13
> [ 4437.367392] Mem-Info:
> [ 4437.367393] Node 0 DMA per-cpu:
> [ 4437.367396] CPU    0: hi:    0, btch:   1 usd:   0
> [ 4437.367397] CPU    1: hi:    0, btch:   1 usd:   0
> [ 4437.367399] Node 0 DMA32 per-cpu:
> [ 4437.367401] CPU    0: hi:  186, btch:  31 usd: 164
> [ 4437.367403] CPU    1: hi:  186, btch:  31 usd: 111
> [ 4437.367405] Node 0 Normal per-cpu:
> [ 4437.367407] CPU    0: hi:  186, btch:  31 usd: 114
> [ 4437.367409] CPU    1: hi:  186, btch:  31 usd: 158
> [ 4437.367413] active_anon:391300 inactive_anon:132951 isolated_anon:0
> [ 4437.367414]  active_file:136666 inactive_file:140710 isolated_file:31
> [ 4437.367415]  unevictable:1 dirty:3402 writeback:26688 unstable:7844
> [ 4437.367416]  free:36509 slab_reclaimable:85289 slab_unreclaimable:35524
> [ 4437.367417]  mapped:18088 shmem:35934 pagetables:9300 bounce:0
> [ 4437.367419] Node 0 DMA free:15712kB min:260kB low:324kB high:388kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:36kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:36kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:160kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:40833 all_unreclaimable? yes
> [ 4437.367428] lowmem_reserve[]: 0 2960 3907 3907
> [ 4437.367432] Node 0 DMA32 free:110732kB min:51004kB low:63752kB high:76504kB active_anon:1380396kB inactive_anon:345140kB active_file:422008kB inactive_file:437440kB unevictable:4kB isolated(anon):0kB isolated(file):124kB present:3031688kB mlocked:4kB dirty:7148kB writeback:72004kB mapped:39424kB shmem:64836kB slab_reclaimable:212408kB slab_unreclaimable:80516kB kernel_stack:1720kB pagetables:19252kB unstable:23964kB bounce:0kB writeback_tmp:0kB pages_scanned:63 all_unreclaimable? no
> [ 4437.367442] lowmem_reserve[]: 0 0 946 946
> [ 4437.367445] Node 0 Normal free:19592kB min:16312kB low:20388kB high:24468kB active_anon:184804kB inactive_anon:186664kB active_file:124656kB inactive_file:125364kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:969600kB mlocked:0kB dirty:6460kB writeback:34712kB mapped:32928kB shmem:78900kB slab_reclaimable:128748kB slab_unreclaimable:61420kB kernel_stack:2792kB pagetables:17948kB unstable:7412kB bounce:0kB writeback_tmp:0kB pages_scanned:89 all_unreclaimable? no
> [ 4437.367455] lowmem_reserve[]: 0 0 0 0
> [ 4437.367458] Node 0 DMA: 2*4kB 1*8kB 1*16kB 0*32kB 1*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 2*4096kB = 15712kB
> [ 4437.367467] Node 0 DMA32: 25961*4kB 73*8kB 8*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 110732kB
> [ 4437.367475] Node 0 Normal: 4134*4kB 0*8kB 1*16kB 1*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 19656kB
> [ 4437.367484] 317456 total pagecache pages
> [ 4437.367485] 4042 pages in swap cache
> [ 4437.367487] Swap cache stats: add 31786, delete 27744, find 10282/11070
> [ 4437.367489] Free swap  = 4012560kB
> [ 4437.367490] Total swap = 4106248kB
> [ 4437.370978] 1032176 pages RAM
> [ 4437.370978] 42834 pages reserved
> [ 4437.370978] 390787 pages shared
> [ 4437.370978] 750687 pages non-shared
> 
> 
> [ 8640.516177] INFO: task flush-0:37:7122 blocked for more than 120 seconds.
> [ 8640.516182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 8640.516186] flush-0:37      D ffff88013bc93580     0  7122      2 0x00000080
> [ 8640.516192]  ffff880072c28810 0000000000000046 ffff880100000000 ffff880134f2e180
> [ 8640.516199]  0000000000013580 ffff88006d491fd8 ffff88006d491fd8 ffff880072c28810
> [ 8640.516205]  ffff88013bfd1c50 000000018134b58b ffff88010c3cc1b0 ffff88006d491d18
> [ 8640.516211] Call Trace:
> [ 8640.516221]  [<ffffffff8110e81a>] ? inode_owner_or_capable+0x36/0x36
> [ 8640.516226]  [<ffffffff8110e820>] ? inode_wait+0x6/0xa
> [ 8640.516232]  [<ffffffff8134a72c>] ? __wait_on_bit+0x3e/0x71
> [ 8640.516241]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
> [ 8640.516245]  [<ffffffff81119674>] ? inode_wait_for_writeback+0xa2/0xc8
> [ 8640.516249]  [<ffffffff810600c9>] ? autoremove_wake_function+0x2a/0x2a
> [ 8640.516252]  [<ffffffff8111b4b4>] ? wb_writeback+0x226/0x255
> [ 8640.516255]  [<ffffffff8134e27d>] ? add_preempt_count+0x9a/0x9c
> [ 8640.516258]  [<ffffffff8111b8d4>] ? wb_do_writeback+0x150/0x1b2
> [ 8640.516261]  [<ffffffff8111b9c5>] ? bdi_writeback_thread+0x8f/0x204
> [ 8640.516264]  [<ffffffff8111b936>] ? wb_do_writeback+0x1b2/0x1b2
> [ 8640.516266]  [<ffffffff8105f9fe>] ? kthread+0x76/0x7e
> [ 8640.516270]  [<ffffffff81351eb4>] ? kernel_thread_helper+0x4/0x10
> [ 8640.516273]  [<ffffffff8105f988>] ? kthread_worker_fn+0x139/0x139
> [ 8640.516275]  [<ffffffff81351eb0>] ? gs_change+0x13/0x13
> [ 8640.516281] INFO: task cp:7568 blocked for more than 120 seconds.
> [ 8640.516283] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 8640.516284] cp              D ffff88013bc13580     0  7568   6744 0x00000080
> [ 8640.516288]  ffff880123976750 0000000000000082 0000000000000000 ffffffff8160d020
> [ 8640.516292]  0000000000013580 ffff88001b3a9fd8 ffff88001b3a9fd8 ffff880123976750
> [ 8640.516295]  0000000000000001 0000000181066767 ffff880131463e50 ffff88013bc13e08
> [ 8640.516299] Call Trace:
> [ 8640.516303]  [<ffffffff810b5d03>] ? __lock_page+0x66/0x66
> [ 8640.516306]  [<ffffffff8134a2ec>] ? io_schedule+0x58/0x6f
> [ 8640.516308]  [<ffffffff810b5d09>] ? sleep_on_page+0x6/0xa
> [ 8640.516311]  [<ffffffff8134a72c>] ? __wait_on_bit+0x3e/0x71
> [ 8640.516313]  [<ffffffff810b5e51>] ? wait_on_page_bit+0x6e/0x73
> [ 8640.516316]  [<ffffffff810600c9>] ? autoremove_wake_function+0x2a/0x2a
> [ 8640.516319]  [<ffffffff810b5f29>] ? filemap_fdatawait_range+0x74/0x139
> [ 8640.516327]  [<ffffffff8111acab>] ? writeback_single_inode+0x155/0x2f4
> [ 8640.516330]  [<ffffffff8111ae94>] ? sync_inode+0x4a/0x6f
> [ 8640.516343]  [<ffffffffa06b9b02>] ? nfs_wb_all+0x39/0x3e [nfs]
> [ 8640.516351]  [<ffffffffa06aeed1>] ? nfs_setattr+0x8e/0xf6 [nfs]
> [ 8640.516354]  [<ffffffff811104c3>] ? notify_change+0x177/0x24f
> [ 8640.516357]  [<ffffffff8111e85c>] ? utimes_common+0x10c/0x135
> [ 8640.516361]  [<ffffffff810fd55a>] ? fget+0x50/0x57
> [ 8640.516364]  [<ffffffff8111e90f>] ? do_utimes+0x8a/0xd6
> [ 8640.516367]  [<ffffffff810fc7a2>] ? vfs_read+0x9f/0xe6
> [ 8640.516369]  [<ffffffff8111ea24>] ? sys_utimensat+0x64/0x6b
> [ 8640.516372]  [<ffffffff8134fd52>] ? system_call_fastpath+0x16/0x1b
> 
> 
> [ 9654.042164] ieee80211 phy0: failed to reallocate TX buffer
> [ 9654.042189] ieee80211 phy0: failed to reallocate TX buffer
> (120,000 lines of this)
> 
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-03-29  5:31   ` Marc MERLIN
@ 2012-03-29  6:49     ` David Miller
  2012-03-29 14:05       ` Marc MERLIN
  2012-03-29 16:41       ` Marc MERLIN
  1 sibling, 1 reply; 72+ messages in thread
From: David Miller @ 2012-03-29  6:49 UTC (permalink / raw)
  To: marc; +Cc: linux-kernel

From: Marc MERLIN <marc@merlins.org>
Date: Wed, 28 Mar 2012 22:31:11 -0700

> This is a pretty bad failure. Someone please tell me there is
> something I can do to work around the problem, and get the
> underlying problem fixed.

First step would be to report this to the proper mailing list,
most networking and wireless developers do not read linux-kernel.

I'd start with linux-wireless@vger.kernel.org and netdev@vger.kernel.org

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-03-29  6:49     ` David Miller
@ 2012-03-29 14:05       ` Marc MERLIN
  0 siblings, 0 replies; 72+ messages in thread
From: Marc MERLIN @ 2012-03-29 14:05 UTC (permalink / raw)
  To: David Miller, James Courtier-Dutton; +Cc: linux-kernel

On Thu, Mar 29, 2012 at 02:49:10AM -0400, David Miller wrote:
> From: Marc MERLIN <marc@merlins.org>
> Date: Wed, 28 Mar 2012 22:31:11 -0700
>
> > This is a pretty bad failure. Someone please tell me there is
> > something I can do to work around the problem, and get the
> > underlying problem fixed.
>
> First step would be to report this to the proper mailing list,
> most networking and wireless developers do not read linux-kernel.
>
> I'd start with linux-wireless@vger.kernel.org and netdev@vger.kernel.org

On Thu, Mar 29, 2012 at 09:36:02AM +0100, James Courtier-Dutton wrote:
> That looks like a network card problem. Probably a hardware bug. Try a
> different network card if you can.

Thanks both for the answer. I don't think it's a network hardware
problem since it happens with both e1000e wired and iwlwifi wireless.

I wasn't quite sure if it was a memory allocation problem that caused the
hangs, but you're right that it's most likely a networking issue regardless
of the fact that it happens on both wired and wireless.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
@ 2012-03-29 16:38 Marc MERLIN
  2012-03-11 18:32 ` Marc MERLIN
  2012-03-29 18:09 ` Ben Hutchings
  0 siblings, 2 replies; 72+ messages in thread
From: Marc MERLIN @ 2012-03-29 16:38 UTC (permalink / raw)
  To: linux-wireless, netdev

Howdy,

I need some help narrowing this bug down so that I can give a better report.

First, the problem is not new to 3.2.8, it appeared after I upgraded
my laptop to a new distribution and switched from 32bits to 64bit kernel
and userland. This in turn seems to be causing memory allocation problems.

When I make a big NFS copy, apparently interrupts don't get serviced
after a while, and my X screen won't update itself (clock) and my mouse
cursor won't move.
Then, after maybe a minute or more, it usually recovers for a few
seconds, and then locks again. The copy usually finishes eventually
if I leave the laptop for a few hours on its own while the UI is
unresponsive.
I've had the problem with both wired and wireless copies, and tried
doing a copy with smbmount instead of nfs, and the issue was similar.

I first had a vol prempt kernel, and tried recompiling with preempt and
that did not help.
My kernel config is here: http://marc.merlins.org/tmp/config.txt

Last night, I started a big copy, and this morning found the copy half
hung. The laptop was responsive, but while I could ping, TCP connections
would put the process in unkillable kernel hung state.

Below are some sysreq dumps I took (syslog to local disk was still
working fine). I know I have Tainted 'G', and I have no idea where that
came from, sorry :-/

It looks like I may run out of some kind of memory which in turn is
deadlocking some drivers?
(actual total memory is fine, user apps do not get OOM'ed and 'free' looked fine)

I ran the relevant sysrq commands wihch are at 
http://marc.merlins.org/tmp/sysrq.txt
(too big to paste here) 

But lots of tasks are hung like so:
[28451.191115] WorkerPool/1248 D ffff88013bc93580     0 12483   3740 0x00000080
[28451.191115]  ffff8801189ba100 0000000000000082 0000000000000000 ffff880134f2e180
[28451.191115]  0000000000013580 ffff88001614bfd8 ffff88001614bfd8 ffff8801189ba100
[28451.191115]  ffffffff811b4b62 000000010164525a 0000000000000046 ffffffff8165a250
[28451.191115] Call Trace:
[28451.191115]  [<ffffffff811b4b62>] ? sha_transform+0x395/0x1209
[28451.191115]  [<ffffffff8134a9b4>] ? __mutex_lock_common.isra.6+0x13d/0x219
[28451.191115]  [<ffffffff81242714>] ? extract_buf+0x86/0xf2
[28451.191115]  [<ffffffff8134a7e6>] ? mutex_lock+0xf/0x1f
[28451.191115]  [<ffffffff81298979>] ? rtnetlink_rcv+0xe/0x28
[28451.191115]  [<ffffffff812ad007>] ? netlink_unicast+0xe6/0x14e
[28451.191115]  [<ffffffff812ad26b>] ? netlink_sendmsg+0x1fc/0x237
[28451.191115]  [<ffffffff8127c770>] ? sock_sendmsg+0xc1/0xde
[28451.191115]  [<ffffffff810eca23>] ? __cache_free.isra.40+0x19/0x1a7
[28451.191115]  [<ffffffff813496be>] ? nl_pid_hash_rehash+0xc8/0xef
[28451.191115]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
[28451.191115]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
[28451.191115]  [<ffffffff8134e1d2>] ? sub_preempt_count+0x83/0x94
[28451.191115]  [<ffffffff810fd81e>] ? fget_light+0x85/0x8d
[28451.191115]  [<ffffffff8127e0e3>] ? sys_sendto+0xf7/0x137
[28451.191115]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
[28451.191115]  [<ffffffff8134e1d2>] ? sub_preempt_count+0x83/0x94
[28451.191115]  [<ffffffff8134b725>] ? _raw_spin_unlock+0x24/0x30
[28451.191115]  [<ffffffff8108d73e>] ? audit_syscall_entry+0x105/0x130
[28451.191115]  [<ffffffff8134fd52>] ? system_call_fastpath+0x16/0x1b



Below are lines I got in syslog during the copy.
Highlight is:
[ 4437.367046] kworker/1:1: page allocation failure: order:1, mode:0x20
and then:
[ 8640.516177] INFO: task flush-0:37:7122 blocked for more than 120 seconds.
and then 120,000 lines(!) of:
[ 9654.042164] ieee80211 phy0: failed to reallocate TX buffer

unedited lines below.

So, any idea of what I can try next?

Thanks,
Marc


[ 4437.367046] kworker/1:1: page allocation failure: order:1, mode:0x20
[ 4437.367053] Pid: 8067, comm: kworker/1:1 Tainted: G           O 3.2.8-amd64-volpreempt-noide-20120208 #1
[ 4437.367056] Call Trace:
[ 4437.367058]  <IRQ>  [<ffffffff810b9ec0>] ? warn_alloc_failed+0x11f/0x132
[ 4437.367074]  [<ffffffff810bcdaa>] ? __alloc_pages_nodemask+0x6b1/0x72f
[ 4437.367081]  [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
[ 4437.367086]  [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
[ 4437.367090]  [<ffffffff810edd21>] ? fallback_alloc+0x123/0x1c2
[ 4437.367096]  [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
[ 4437.367101]  [<ffffffff810ee215>] ? __kmalloc+0xb2/0x10a
[ 4437.367105]  [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
[ 4437.367139]  [<ffffffffa03e22c1>] ? ieee80211_skb_resize+0x64/0x9d [mac80211]
[ 4437.367154]  [<ffffffffa03e4252>] ? ieee80211_subif_start_xmit+0x705/0x883 [mac80211]
[ 4437.367175]  [<ffffffff8128e767>] ? dev_hard_start_xmit+0x40b/0x552
[ 4437.367179]  [<ffffffff812a4adc>] ? sch_direct_xmit+0x63/0x13a
[ 4437.367182]  [<ffffffff8128eb8e>] ? dev_queue_xmit+0x2e0/0x4b5
[ 4437.367185]  [<ffffffff812b764d>] ? ip_finish_output2+0x1c7/0x218
[ 4437.367188]  [<ffffffff812b86aa>] ? __ip_flush_pending_frames.isra.29+0x69/0x69
[ 4437.367191]  [<ffffffff812b8a6a>] ? ip_queue_xmit+0x2cd/0x30d
[ 4437.367195]  [<ffffffff81066be9>] ? getnstimeofday+0x4a/0x7b
[ 4437.367198]  [<ffffffff812ca1d2>] ? tcp_transmit_skb+0x6d7/0x70a
[ 4437.367201]  [<ffffffff812cac5f>] ? tcp_write_xmit+0x698/0x7a1
[ 4437.367204]  [<ffffffff812c77bf>] ? tcp_ack+0x14e3/0x1658
[ 4437.367207]  [<ffffffff812c89bd>] ? tcp_established_options+0x2b/0x9e
[ 4437.367210]  [<ffffffff812cada9>] ? __tcp_push_pending_frames+0x18/0x44
[ 4437.367213]  [<ffffffff812c4e27>] ? tcp_data_snd_check+0x2c/0xfd
[ 4437.367216]  [<ffffffff812c86c5>] ? tcp_rcv_established+0x4f0/0x549
[ 4437.367220]  [<ffffffff8103ec39>] ? select_task_rq_fair+0x67b/0x690
[ 4437.367223]  [<ffffffff812ce735>] ? tcp_v4_do_rcv+0x166/0x323
[ 4437.367226]  [<ffffffff812cfdce>] ? tcp_v4_rcv+0x404/0x65d
[ 4437.367230]  [<ffffffff812b4d55>] ? ip_local_deliver_finish+0x148/0x1ba
[ 4437.367233]  [<ffffffff8128cfa4>] ? __netif_receive_skb+0x3f2/0x43f
[ 4437.367236]  [<ffffffff8128d31d>] ? netif_receive_skb+0x7e/0x84
[ 4437.367239]  [<ffffffff8128d7dd>] ? napi_gro_receive+0x1c/0x29
[ 4437.367241]  [<ffffffff8128d398>] ? napi_skb_finish+0x1c/0x31
[ 4437.367253]  [<ffffffffa026bde3>] ? e1000_clean_rx_irq+0x1f3/0x290 [e1000e]
[ 4437.367261]  [<ffffffffa026c26c>] ? e1000_clean+0x69/0x208 [e1000e]
[ 4437.367264]  [<ffffffff8128d8fb>] ? net_rx_action+0xa4/0x1c0
[ 4437.367268]  [<ffffffff8104c581>] ? __do_softirq+0xc0/0x188
[ 4437.367272]  [<ffffffff81351fac>] ? call_softirq+0x1c/0x30
[ 4437.367276]  [<ffffffff8100f98d>] ? do_softirq+0x3c/0x7b
[ 4437.367278]  [<ffffffff8104c87c>] ? irq_exit+0x3d/0xa7
[ 4437.367281]  [<ffffffff8100f6b4>] ? do_IRQ+0x81/0x97
[ 4437.367285]  [<ffffffff8134ba2e>] ? common_interrupt+0x6e/0x6e
[ 4437.367287]  <EOI>  [<ffffffffa008b32c>] ? dec128+0x434/0x80c [aes_x86_64]
[ 4437.367307]  [<ffffffffa0085164>] ? crypt+0xae/0x101 [xts]
[ 4437.367313]  [<ffffffffa008b712>] ? aes_decrypt+0xe/0xe [aes_x86_64]
[ 4437.367320]  [<ffffffffa008b704>] ? dec128+0x80c/0x80c [aes_x86_64]
[ 4437.367327]  [<ffffffffa00851f6>] ? decrypt+0x3f/0x44 [xts]
[ 4437.367331]  [<ffffffff8118cdb3>] ? async_decrypt+0x37/0x3c
[ 4437.367338]  [<ffffffffa0105e2a>] ? crypt_convert+0x22f/0x2c4 [dm_crypt]
[ 4437.367342]  [<ffffffff8100d02f>] ? load_TLS+0x7/0xa
[ 4437.367348]  [<ffffffffa01061b8>] ? kcryptd_crypt+0x56/0x342 [dm_crypt]
[ 4437.367352]  [<ffffffff81038cd2>] ? finish_task_switch+0x86/0xb7
[ 4437.367355]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
[ 4437.367358]  [<ffffffff8134e1d2>] ? sub_preempt_count+0x83/0x94
[ 4437.367361]  [<ffffffff8103612b>] ? need_resched+0x1a/0x23
[ 4437.367368]  [<ffffffffa0106162>] ? crypt_convert_init.isra.14+0x4f/0x4f [dm_crypt]
[ 4437.367372]  [<ffffffff8105b867>] ? process_one_work+0x16d/0x298
[ 4437.367375]  [<ffffffff8105c84a>] ? worker_thread+0xc2/0x145
[ 4437.367378]  [<ffffffff8105c788>] ? manage_workers.isra.23+0x15b/0x15b
[ 4437.367381]  [<ffffffff8105f9fe>] ? kthread+0x76/0x7e
[ 4437.367384]  [<ffffffff81351eb4>] ? kernel_thread_helper+0x4/0x10
[ 4437.367387]  [<ffffffff8105f988>] ? kthread_worker_fn+0x139/0x139
[ 4437.367390]  [<ffffffff81351eb0>] ? gs_change+0x13/0x13
[ 4437.367392] Mem-Info:
[ 4437.367393] Node 0 DMA per-cpu:
[ 4437.367396] CPU    0: hi:    0, btch:   1 usd:   0
[ 4437.367397] CPU    1: hi:    0, btch:   1 usd:   0
[ 4437.367399] Node 0 DMA32 per-cpu:
[ 4437.367401] CPU    0: hi:  186, btch:  31 usd: 164
[ 4437.367403] CPU    1: hi:  186, btch:  31 usd: 111
[ 4437.367405] Node 0 Normal per-cpu:
[ 4437.367407] CPU    0: hi:  186, btch:  31 usd: 114
[ 4437.367409] CPU    1: hi:  186, btch:  31 usd: 158
[ 4437.367413] active_anon:391300 inactive_anon:132951 isolated_anon:0
[ 4437.367414]  active_file:136666 inactive_file:140710 isolated_file:31
[ 4437.367415]  unevictable:1 dirty:3402 writeback:26688 unstable:7844
[ 4437.367416]  free:36509 slab_reclaimable:85289 slab_unreclaimable:35524
[ 4437.367417]  mapped:18088 shmem:35934 pagetables:9300 bounce:0
[ 4437.367419] Node 0 DMA free:15712kB min:260kB low:324kB high:388kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:36kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:36kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:160kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:40833 all_unreclaimable? yes
[ 4437.367428] lowmem_reserve[]: 0 2960 3907 3907
[ 4437.367432] Node 0 DMA32 free:110732kB min:51004kB low:63752kB high:76504kB active_anon:1380396kB inactive_anon:345140kB active_file:422008kB inactive_file:437440kB unevictable:4kB isolated(anon):0kB isolated(file):124kB present:3031688kB mlocked:4kB dirty:7148kB writeback:72004kB mapped:39424kB shmem:64836kB slab_reclaimable:212408kB slab_unreclaimable:80516kB kernel_stack:1720kB pagetables:19252kB unstable:23964kB bounce:0kB writeback_tmp:0kB pages_scanned:63 all_unreclaimable? no
[ 4437.367442] lowmem_reserve[]: 0 0 946 946
[ 4437.367445] Node 0 Normal free:19592kB min:16312kB low:20388kB high:24468kB active_anon:184804kB inactive_anon:186664kB active_file:124656kB inactive_file:125364kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:969600kB mlocked:0kB dirty:6460kB writeback:34712kB mapped:32928kB shmem:78900kB slab_reclaimable:128748kB slab_unreclaimable:61420kB kernel_stack:2792kB pagetables:17948kB unstable:7412kB bounce:0kB writeback_tmp:0kB pages_scanned:89 all_unreclaimable? no
[ 4437.367455] lowmem_reserve[]: 0 0 0 0
[ 4437.367458] Node 0 DMA: 2*4kB 1*8kB 1*16kB 0*32kB 1*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 2*4096kB = 15712kB
[ 4437.367467] Node 0 DMA32: 25961*4kB 73*8kB 8*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 110732kB
[ 4437.367475] Node 0 Normal: 4134*4kB 0*8kB 1*16kB 1*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 19656kB
[ 4437.367484] 317456 total pagecache pages
[ 4437.367485] 4042 pages in swap cache
[ 4437.367487] Swap cache stats: add 31786, delete 27744, find 10282/11070
[ 4437.367489] Free swap  = 4012560kB
[ 4437.367490] Total swap = 4106248kB
[ 4437.370978] 1032176 pages RAM
[ 4437.370978] 42834 pages reserved
[ 4437.370978] 390787 pages shared
[ 4437.370978] 750687 pages non-shared


[ 8640.516177] INFO: task flush-0:37:7122 blocked for more than 120 seconds.
[ 8640.516182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8640.516186] flush-0:37      D ffff88013bc93580     0  7122      2 0x00000080
[ 8640.516192]  ffff880072c28810 0000000000000046 ffff880100000000 ffff880134f2e180
[ 8640.516199]  0000000000013580 ffff88006d491fd8 ffff88006d491fd8 ffff880072c28810
[ 8640.516205]  ffff88013bfd1c50 000000018134b58b ffff88010c3cc1b0 ffff88006d491d18
[ 8640.516211] Call Trace:
[ 8640.516221]  [<ffffffff8110e81a>] ? inode_owner_or_capable+0x36/0x36
[ 8640.516226]  [<ffffffff8110e820>] ? inode_wait+0x6/0xa
[ 8640.516232]  [<ffffffff8134a72c>] ? __wait_on_bit+0x3e/0x71
[ 8640.516241]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
[ 8640.516245]  [<ffffffff81119674>] ? inode_wait_for_writeback+0xa2/0xc8
[ 8640.516249]  [<ffffffff810600c9>] ? autoremove_wake_function+0x2a/0x2a
[ 8640.516252]  [<ffffffff8111b4b4>] ? wb_writeback+0x226/0x255
[ 8640.516255]  [<ffffffff8134e27d>] ? add_preempt_count+0x9a/0x9c
[ 8640.516258]  [<ffffffff8111b8d4>] ? wb_do_writeback+0x150/0x1b2
[ 8640.516261]  [<ffffffff8111b9c5>] ? bdi_writeback_thread+0x8f/0x204
[ 8640.516264]  [<ffffffff8111b936>] ? wb_do_writeback+0x1b2/0x1b2
[ 8640.516266]  [<ffffffff8105f9fe>] ? kthread+0x76/0x7e
[ 8640.516270]  [<ffffffff81351eb4>] ? kernel_thread_helper+0x4/0x10
[ 8640.516273]  [<ffffffff8105f988>] ? kthread_worker_fn+0x139/0x139
[ 8640.516275]  [<ffffffff81351eb0>] ? gs_change+0x13/0x13
[ 8640.516281] INFO: task cp:7568 blocked for more than 120 seconds.
[ 8640.516283] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 8640.516284] cp              D ffff88013bc13580     0  7568   6744 0x00000080
[ 8640.516288]  ffff880123976750 0000000000000082 0000000000000000 ffffffff8160d020
[ 8640.516292]  0000000000013580 ffff88001b3a9fd8 ffff88001b3a9fd8 ffff880123976750
[ 8640.516295]  0000000000000001 0000000181066767 ffff880131463e50 ffff88013bc13e08
[ 8640.516299] Call Trace:
[ 8640.516303]  [<ffffffff810b5d03>] ? __lock_page+0x66/0x66
[ 8640.516306]  [<ffffffff8134a2ec>] ? io_schedule+0x58/0x6f
[ 8640.516308]  [<ffffffff810b5d09>] ? sleep_on_page+0x6/0xa
[ 8640.516311]  [<ffffffff8134a72c>] ? __wait_on_bit+0x3e/0x71
[ 8640.516313]  [<ffffffff810b5e51>] ? wait_on_page_bit+0x6e/0x73
[ 8640.516316]  [<ffffffff810600c9>] ? autoremove_wake_function+0x2a/0x2a
[ 8640.516319]  [<ffffffff810b5f29>] ? filemap_fdatawait_range+0x74/0x139
[ 8640.516327]  [<ffffffff8111acab>] ? writeback_single_inode+0x155/0x2f4
[ 8640.516330]  [<ffffffff8111ae94>] ? sync_inode+0x4a/0x6f
[ 8640.516343]  [<ffffffffa06b9b02>] ? nfs_wb_all+0x39/0x3e [nfs]
[ 8640.516351]  [<ffffffffa06aeed1>] ? nfs_setattr+0x8e/0xf6 [nfs]
[ 8640.516354]  [<ffffffff811104c3>] ? notify_change+0x177/0x24f
[ 8640.516357]  [<ffffffff8111e85c>] ? utimes_common+0x10c/0x135
[ 8640.516361]  [<ffffffff810fd55a>] ? fget+0x50/0x57
[ 8640.516364]  [<ffffffff8111e90f>] ? do_utimes+0x8a/0xd6
[ 8640.516367]  [<ffffffff810fc7a2>] ? vfs_read+0x9f/0xe6
[ 8640.516369]  [<ffffffff8111ea24>] ? sys_utimensat+0x64/0x6b
[ 8640.516372]  [<ffffffff8134fd52>] ? system_call_fastpath+0x16/0x1b


[ 9654.042164] ieee80211 phy0: failed to reallocate TX buffer
[ 9654.042189] ieee80211 phy0: failed to reallocate TX buffer
(120,000 lines of this)

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
@ 2012-03-29 16:41       ` Marc MERLIN
  0 siblings, 0 replies; 72+ messages in thread
From: Marc MERLIN @ 2012-03-29 16:41 UTC (permalink / raw)
  To: linux-wireless, netdev

As an update I upgraded the laptop from 4GB to 8GB and I still cannot to a big
copy via NFS without the laptop hanging pretty badly.

To be clear, I've had the problem both over wired ethernet (e1000e) and
intel 5300agn (iwlwifi)

Doing the same copy with rsync over ssh worked.
I also did a big NFS copy inbound instead of outbound and that seems to have
worke too.

Can someone tell me there is something I can do to work around the problem,
and get the underlying problem fixed.
(well, looks like not using NFS is the workaround)

Thanks,
Marc

I tried the copy over wifi this time instead of e1000e, and got:

mc: page allocation failure: order:1, mode:0x20
Pid: 7099, comm: mc Tainted: G        W  O 3.2.8-amd64-volpreempt-noide-20120208 #1
Call Trace:
 <IRQ>  [<ffffffff810b9ec0>] ? warn_alloc_failed+0x11f/0x132
 [<ffffffff810bcdaa>] ? __alloc_pages_nodemask+0x6b1/0x72f
 [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
 [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
 [<ffffffff810edd21>] ? fallback_alloc+0x123/0x1c2
 [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
 [<ffffffff810ee215>] ? __kmalloc+0xb2/0x10a
 [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
 [<ffffffffa096a2c1>] ? ieee80211_skb_resize+0x64/0x9d [mac80211]
 [<ffffffffa096c252>] ? ieee80211_subif_start_xmit+0x705/0x883 [mac80211]
 [<ffffffff81036108>] ? test_tsk_need_resched+0xe/0x17
 [<ffffffff8128e767>] ? dev_hard_start_xmit+0x40b/0x552
 [<ffffffff8104c8ec>] ? raise_softirq_irqoff+0x6/0x27
 [<ffffffff812a4adc>] ? sch_direct_xmit+0x63/0x13a
 [<ffffffff8128eb8e>] ? dev_queue_xmit+0x2e0/0x4b5
 [<ffffffff812b764d>] ? ip_finish_output2+0x1c7/0x218
 [<ffffffff812b86aa>] ? __ip_flush_pending_frames.isra.29+0x69/0x69
 [<ffffffff812b8a6a>] ? ip_queue_xmit+0x2cd/0x30d
 [<ffffffff81066be9>] ? getnstimeofday+0x4a/0x7b
 [<ffffffff812ca1d2>] ? tcp_transmit_skb+0x6d7/0x70a
 [<ffffffff812cac5f>] ? tcp_write_xmit+0x698/0x7a1
 [<ffffffff812c77bf>] ? tcp_ack+0x14e3/0x1658
 [<ffffffff812c89bd>] ? tcp_established_options+0x2b/0x9e
 [<ffffffff812cada9>] ? __tcp_push_pending_frames+0x18/0x44
 [<ffffffff812c4e27>] ? tcp_data_snd_check+0x2c/0xfd
 [<ffffffff812c86c5>] ? tcp_rcv_established+0x4f0/0x549
 [<ffffffff812ce735>] ? tcp_v4_do_rcv+0x166/0x323
 [<ffffffff812cfdce>] ? tcp_v4_rcv+0x404/0x65d
 [<ffffffff81036108>] ? test_tsk_need_resched+0xe/0x17
 [<ffffffff812b4d55>] ? ip_local_deliver_finish+0x148/0x1ba
 [<ffffffff8128cfa4>] ? __netif_receive_skb+0x3f2/0x43f
 [<ffffffff8128d31d>] ? netif_receive_skb+0x7e/0x84
 [<ffffffffa0966bd6>] ? ieee80211_deliver_skb+0xbb/0xf1 [mac80211]
 [<ffffffffa0967f32>] ? ieee80211_rx_handlers+0x1041/0x18a7 [mac80211]
 [<ffffffff810528d2>] ? lock_timer_base.isra.29+0x23/0x47
 [<ffffffff81071629>] ? arch_local_irq_save+0x11/0x17
 [<ffffffff8134b58b>] ? _raw_spin_lock_irqsave+0x1c/0x41
 [<ffffffffa0966730>] ? ieee80211_release_reorder_frame+0x35/0x4a [mac80211]
 [<ffffffffa0968fb2>] ? ieee80211_prepare_and_rx_handle+0x81a/0x872 [mac80211]
 [<ffffffffa09696e6>] ? ieee80211_rx+0x6dc/0x706 [mac80211]
 [<ffffffffa09a20e4>] ? iwlagn_rx_reply_rx+0x3c2/0x3dc [iwlwifi]
 [<ffffffff810398a1>] ? resched_task+0x48/0x6c
 [<ffffffffa09ab9f8>] ? iwl_irq_tasklet+0x446/0x6df [iwlwifi]
 [<ffffffff81039ff4>] ? check_preempt_curr+0x52/0x5f
 [<ffffffff8104c445>] ? tasklet_action+0x79/0xc8
 [<ffffffff8104c581>] ? __do_softirq+0xc0/0x188
 [<ffffffff81351fac>] ? call_softirq+0x1c/0x30
 [<ffffffff8100f98d>] ? do_softirq+0x3c/0x7b
 [<ffffffff8104c87c>] ? irq_exit+0x3d/0xa7
 [<ffffffff8100f6b4>] ? do_IRQ+0x81/0x97
 [<ffffffff8134ba2e>] ? common_interrupt+0x6e/0x6e
 <EOI>  [<ffffffff8134fe84>] ? sysret_audit+0x16/0x20
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd: 173
CPU    1: hi:  186, btch:  31 usd: 134
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 157
CPU    1: hi:  186, btch:  31 usd: 101
active_anon:1031319 inactive_anon:207860 isolated_anon:0
 active_file:234263 inactive_file:341759 isolated_file:0
 unevictable:9 dirty:21221 writeback:96084 unstable:129
 free:40765 slab_reclaimable:89393 slab_unreclaimable:19850
 mapped:29109 shmem:65088 pagetables:21560 bounce:0
Node 0 DMA free:15908kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 2960 7947 7947
Node 0 DMA32 free:94360kB min:25128kB low:31408kB high:37692kB active_anon:1122272kB inactive_anon:301128kB active_file:479144kB inactive_file:802228kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3031688kB mlocked:0kB dirty:25936kB writeback:193704kB mapped:41696kB shmem:78944kB slab_reclaimable:174676kB slab_unreclaimable:19700kB kernel_stack:3056kB pagetables:11056kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Afer a few of these, I just got, a loop of 'failed to reallocate TX buffer'
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:  32
CPU    1: hi:  186, btch:  31 usd:  20
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 132
CPU    1: hi:  186, btch:  31 usd: 174
active_anon:1025929 inactive_anon:209027 isolated_anon:17
 active_file:235971 inactive_file:313547 isolated_file:6
 unevictable:9 dirty:86311 writeback:91894 unstable:4359
 free:70274 slab_reclaimable:88825 slab_unreclaimable:21955
 mapped:28146 shmem:66271 pagetables:21542 bounce:0
Node 0 DMA free:15908kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 2960 7947 7947
Node 0 DMA32 free:212292kB min:25128kB low:31408kB high:37692kB active_anon:1110276kB inactive_anon: 304520kB active_file:479356kB inactive_file:691616kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3031688kB mlocked:0kB dirty:111500kB writeback:193764kB mapped:41348kB shmem:82336kB slab_reclaimable:173504kB slab_unreclaimable:23312kB kernel_stack:3056kB pagetables:11028kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 4986 4986
Node 0 Normal free:52896kB min:42324kB low:52904kB high:63484kB active_anon:2993440kB inactive_anon:531588kB active_file:464528kB inactive_file:562572kB unevictable:36kB isolated(anon):68kB isolated(file):24kB present:5106560kB mlocked:36kB dirty:233744kB writeback:173812kB mapped:71236kB shmem:182748kB slab_reclaimable:181796kB slab_unreclaimable:64508kB kernel_stack:3736kB pagetables:75140kB unstable:17436kB bounce:0kB writeback_tmp:0kB pages_scanned:98 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15908kB
Node 0 DMA32: 51901*4kB 1*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 212316kB
Node 0 Normal: 12200*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 52896kB
682814 total pagecache pages
67047 pages in swap cache
Swap cache stats: add 802793, delete 735746, find 537857/581722
Free swap  = 2660592kB
Total swap = 4106248kB
2080752 pages RAM
57174 pages reserved
842957 pages shared
1472720 pages non-shared
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
@ 2012-03-29 16:41       ` Marc MERLIN
  0 siblings, 0 replies; 72+ messages in thread
From: Marc MERLIN @ 2012-03-29 16:41 UTC (permalink / raw)
  To: linux-wireless-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA

As an update I upgraded the laptop from 4GB to 8GB and I still cannot to a big
copy via NFS without the laptop hanging pretty badly.

To be clear, I've had the problem both over wired ethernet (e1000e) and
intel 5300agn (iwlwifi)

Doing the same copy with rsync over ssh worked.
I also did a big NFS copy inbound instead of outbound and that seems to have
worke too.

Can someone tell me there is something I can do to work around the problem,
and get the underlying problem fixed.
(well, looks like not using NFS is the workaround)

Thanks,
Marc

I tried the copy over wifi this time instead of e1000e, and got:

mc: page allocation failure: order:1, mode:0x20
Pid: 7099, comm: mc Tainted: G        W  O 3.2.8-amd64-volpreempt-noide-20120208 #1
Call Trace:
 <IRQ>  [<ffffffff810b9ec0>] ? warn_alloc_failed+0x11f/0x132
 [<ffffffff810bcdaa>] ? __alloc_pages_nodemask+0x6b1/0x72f
 [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
 [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
 [<ffffffff810edd21>] ? fallback_alloc+0x123/0x1c2
 [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
 [<ffffffff810ee215>] ? __kmalloc+0xb2/0x10a
 [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
 [<ffffffffa096a2c1>] ? ieee80211_skb_resize+0x64/0x9d [mac80211]
 [<ffffffffa096c252>] ? ieee80211_subif_start_xmit+0x705/0x883 [mac80211]
 [<ffffffff81036108>] ? test_tsk_need_resched+0xe/0x17
 [<ffffffff8128e767>] ? dev_hard_start_xmit+0x40b/0x552
 [<ffffffff8104c8ec>] ? raise_softirq_irqoff+0x6/0x27
 [<ffffffff812a4adc>] ? sch_direct_xmit+0x63/0x13a
 [<ffffffff8128eb8e>] ? dev_queue_xmit+0x2e0/0x4b5
 [<ffffffff812b764d>] ? ip_finish_output2+0x1c7/0x218
 [<ffffffff812b86aa>] ? __ip_flush_pending_frames.isra.29+0x69/0x69
 [<ffffffff812b8a6a>] ? ip_queue_xmit+0x2cd/0x30d
 [<ffffffff81066be9>] ? getnstimeofday+0x4a/0x7b
 [<ffffffff812ca1d2>] ? tcp_transmit_skb+0x6d7/0x70a
 [<ffffffff812cac5f>] ? tcp_write_xmit+0x698/0x7a1
 [<ffffffff812c77bf>] ? tcp_ack+0x14e3/0x1658
 [<ffffffff812c89bd>] ? tcp_established_options+0x2b/0x9e
 [<ffffffff812cada9>] ? __tcp_push_pending_frames+0x18/0x44
 [<ffffffff812c4e27>] ? tcp_data_snd_check+0x2c/0xfd
 [<ffffffff812c86c5>] ? tcp_rcv_established+0x4f0/0x549
 [<ffffffff812ce735>] ? tcp_v4_do_rcv+0x166/0x323
 [<ffffffff812cfdce>] ? tcp_v4_rcv+0x404/0x65d
 [<ffffffff81036108>] ? test_tsk_need_resched+0xe/0x17
 [<ffffffff812b4d55>] ? ip_local_deliver_finish+0x148/0x1ba
 [<ffffffff8128cfa4>] ? __netif_receive_skb+0x3f2/0x43f
 [<ffffffff8128d31d>] ? netif_receive_skb+0x7e/0x84
 [<ffffffffa0966bd6>] ? ieee80211_deliver_skb+0xbb/0xf1 [mac80211]
 [<ffffffffa0967f32>] ? ieee80211_rx_handlers+0x1041/0x18a7 [mac80211]
 [<ffffffff810528d2>] ? lock_timer_base.isra.29+0x23/0x47
 [<ffffffff81071629>] ? arch_local_irq_save+0x11/0x17
 [<ffffffff8134b58b>] ? _raw_spin_lock_irqsave+0x1c/0x41
 [<ffffffffa0966730>] ? ieee80211_release_reorder_frame+0x35/0x4a [mac80211]
 [<ffffffffa0968fb2>] ? ieee80211_prepare_and_rx_handle+0x81a/0x872 [mac80211]
 [<ffffffffa09696e6>] ? ieee80211_rx+0x6dc/0x706 [mac80211]
 [<ffffffffa09a20e4>] ? iwlagn_rx_reply_rx+0x3c2/0x3dc [iwlwifi]
 [<ffffffff810398a1>] ? resched_task+0x48/0x6c
 [<ffffffffa09ab9f8>] ? iwl_irq_tasklet+0x446/0x6df [iwlwifi]
 [<ffffffff81039ff4>] ? check_preempt_curr+0x52/0x5f
 [<ffffffff8104c445>] ? tasklet_action+0x79/0xc8
 [<ffffffff8104c581>] ? __do_softirq+0xc0/0x188
 [<ffffffff81351fac>] ? call_softirq+0x1c/0x30
 [<ffffffff8100f98d>] ? do_softirq+0x3c/0x7b
 [<ffffffff8104c87c>] ? irq_exit+0x3d/0xa7
 [<ffffffff8100f6b4>] ? do_IRQ+0x81/0x97
 [<ffffffff8134ba2e>] ? common_interrupt+0x6e/0x6e
 <EOI>  [<ffffffff8134fe84>] ? sysret_audit+0x16/0x20
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd: 173
CPU    1: hi:  186, btch:  31 usd: 134
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 157
CPU    1: hi:  186, btch:  31 usd: 101
active_anon:1031319 inactive_anon:207860 isolated_anon:0
 active_file:234263 inactive_file:341759 isolated_file:0
 unevictable:9 dirty:21221 writeback:96084 unstable:129
 free:40765 slab_reclaimable:89393 slab_unreclaimable:19850
 mapped:29109 shmem:65088 pagetables:21560 bounce:0
Node 0 DMA free:15908kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 2960 7947 7947
Node 0 DMA32 free:94360kB min:25128kB low:31408kB high:37692kB active_anon:1122272kB inactive_anon:301128kB active_file:479144kB inactive_file:802228kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3031688kB mlocked:0kB dirty:25936kB writeback:193704kB mapped:41696kB shmem:78944kB slab_reclaimable:174676kB slab_unreclaimable:19700kB kernel_stack:3056kB pagetables:11056kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Afer a few of these, I just got, a loop of 'failed to reallocate TX buffer'
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:  32
CPU    1: hi:  186, btch:  31 usd:  20
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 132
CPU    1: hi:  186, btch:  31 usd: 174
active_anon:1025929 inactive_anon:209027 isolated_anon:17
 active_file:235971 inactive_file:313547 isolated_file:6
 unevictable:9 dirty:86311 writeback:91894 unstable:4359
 free:70274 slab_reclaimable:88825 slab_unreclaimable:21955
 mapped:28146 shmem:66271 pagetables:21542 bounce:0
Node 0 DMA free:15908kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 2960 7947 7947
Node 0 DMA32 free:212292kB min:25128kB low:31408kB high:37692kB active_anon:1110276kB inactive_anon: 304520kB active_file:479356kB inactive_file:691616kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3031688kB mlocked:0kB dirty:111500kB writeback:193764kB mapped:41348kB shmem:82336kB slab_reclaimable:173504kB slab_unreclaimable:23312kB kernel_stack:3056kB pagetables:11028kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 4986 4986
Node 0 Normal free:52896kB min:42324kB low:52904kB high:63484kB active_anon:2993440kB inactive_anon:531588kB active_file:464528kB inactive_file:562572kB unevictable:36kB isolated(anon):68kB isolated(file):24kB present:5106560kB mlocked:36kB dirty:233744kB writeback:173812kB mapped:71236kB shmem:182748kB slab_reclaimable:181796kB slab_unreclaimable:64508kB kernel_stack:3736kB pagetables:75140kB unstable:17436kB bounce:0kB writeback_tmp:0kB pages_scanned:98 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15908kB
Node 0 DMA32: 51901*4kB 1*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 212316kB
Node 0 Normal: 12200*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 52896kB
682814 total pagecache pages
67047 pages in swap cache
Swap cache stats: add 802793, delete 735746, find 537857/581722
Free swap  = 2660592kB
Total swap = 4106248kB
2080752 pages RAM
57174 pages reserved
842957 pages shared
1472720 pages non-shared
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer
ieee80211 phy0: failed to reallocate TX buffer

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-03-29 16:38 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure) Marc MERLIN
  2012-03-11 18:32 ` Marc MERLIN
@ 2012-03-29 18:09 ` Ben Hutchings
  2012-03-29 21:19     ` Marc MERLIN
  2012-04-09 17:20   ` Marc MERLIN
  1 sibling, 2 replies; 72+ messages in thread
From: Ben Hutchings @ 2012-03-29 18:09 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-wireless, netdev

On Thu, 2012-03-29 at 09:38 -0700, Marc MERLIN wrote:
[...]
> Below are some sysreq dumps I took (syslog to local disk was still
> working fine). I know I have Tainted 'G', and I have no idea where that
> came from, sorry :-/
[...]

'G' isn't a taint flag, but the following 'O' is; it means you have one
or more out-of-tree modules loaded.  Care to tell us what they are?

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
@ 2012-03-29 21:19     ` Marc MERLIN
  0 siblings, 0 replies; 72+ messages in thread
From: Marc MERLIN @ 2012-03-29 21:19 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-wireless, netdev

On Thu, Mar 29, 2012 at 07:09:35PM +0100, Ben Hutchings wrote:
> On Thu, 2012-03-29 at 09:38 -0700, Marc MERLIN wrote:
> [...]
> > Below are some sysreq dumps I took (syslog to local disk was still
> > working fine). I know I have Tainted 'G', and I have no idea where that
> > came from, sorry :-/
> [...]
> 
> 'G' isn't a taint flag, but the following 'O' is; it means you have one
> or more out-of-tree modules loaded.  Care to tell us what they are?

All I can think of is virtualbox, but I'm almost certain I did not have
virtualbox loaded (i.e. modules disabled so they wouldn't load) for my first
test with wired ethernet (yes, I know that virtualbox modules are known 
for causing problems).

Ah, yes, I had another out of tree module:
http://www.thinkwiki.org/wiki/Tp_smapi
(on thinkpad W500)

Module                  Size  Used by
hdaps                  13616  1 
tp_smapi               23531  0 
thinkpad_ec            12857  2 hdaps,tp_smapi

So, I suppose the next step is to reproduce without those modules loaded,
which I'll do next.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
@ 2012-03-29 21:19     ` Marc MERLIN
  0 siblings, 0 replies; 72+ messages in thread
From: Marc MERLIN @ 2012-03-29 21:19 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA

On Thu, Mar 29, 2012 at 07:09:35PM +0100, Ben Hutchings wrote:
> On Thu, 2012-03-29 at 09:38 -0700, Marc MERLIN wrote:
> [...]
> > Below are some sysreq dumps I took (syslog to local disk was still
> > working fine). I know I have Tainted 'G', and I have no idea where that
> > came from, sorry :-/
> [...]
> 
> 'G' isn't a taint flag, but the following 'O' is; it means you have one
> or more out-of-tree modules loaded.  Care to tell us what they are?

All I can think of is virtualbox, but I'm almost certain I did not have
virtualbox loaded (i.e. modules disabled so they wouldn't load) for my first
test with wired ethernet (yes, I know that virtualbox modules are known 
for causing problems).

Ah, yes, I had another out of tree module:
http://www.thinkwiki.org/wiki/Tp_smapi
(on thinkpad W500)

Module                  Size  Used by
hdaps                  13616  1 
tp_smapi               23531  0 
thinkpad_ec            12857  2 hdaps,tp_smapi

So, I suppose the next step is to reproduce without those modules loaded,
which I'll do next.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-03-29 18:09 ` Ben Hutchings
  2012-03-29 21:19     ` Marc MERLIN
@ 2012-04-09 17:20   ` Marc MERLIN
  2012-04-09 18:12     ` David Miller
  1 sibling, 1 reply; 72+ messages in thread
From: Marc MERLIN @ 2012-04-09 17:20 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: linux-wireless, netdev

On Thu, Mar 29, 2012 at 07:09:35PM +0100, Ben Hutchings wrote:
> On Thu, 2012-03-29 at 09:38 -0700, Marc MERLIN wrote:
> [...]
> > Below are some sysreq dumps I took (syslog to local disk was still
> > working fine). I know I have Tainted 'G', and I have no idea where that
> > came from, sorry :-/
> [...]
> 
> 'G' isn't a taint flag, but the following 'O' is; it means you have one
> or more out-of-tree modules loaded.  Care to tell us what they are?

Ok, I just had time to reproduce this without tainting. It was with 3.2.5
though since I'm debugging a battery power use issue too.

I confirmed that the problem only happens when I do NFS copies. Doing the
same copy with rsync over SSH works fine.

I'm not much of a kernel guy, but it looks like there is a memory problem,
which in turn causes the network driver to deadlock waiting on RAM with 
interrupts turned off (mouse cursor won't move). Eventually it gets a bit of
the special RAM it seems to need, at that point, my frozen mouse pointer
unfreezes until it locks again soon after.

So there are 2 problems
1) memory issue
2) driver waits for RAM with interrupts turned off

The log below shows offlineimap first getting memory allocation problems
as soon after I start the copy, then firefox, and some:
ieee80211 phy0: failed to reallocate TX buffer

No taint flags this time.
I'll paste a few lines below.

Any idea what's going on, what bucket of RAM is an issue for the driver
(total RAM is more than plenty), and where I should go from there?

kernel config: http://marc.merlins.org/tmp/config.txt

Thanks,
Marc

09:44:12  offlineimap: page allocation failure: order:1, mode:0x20
09:44:12  Pid: 6269, comm: offlineimap Tainted: G           O 3.2.5amd64-volpreempt-noide-20120208 #1
09:44:12  Call Trace:
09:44:12   <IRQ>  [<ffffffff810b755a>] ? warn_alloc_failed+0x11a/0x12d
09:44:12   [<ffffffff810ba394>] ? __alloc_pages_nodemask+0x6b2/0x726
09:44:12   [<ffffffff810e9081>] ? kmem_getpages+0x4c/0xd9
09:44:12   [<ffffffff810e9081>] ? kmem_getpages+0x4c/0xd9
09:44:12   [<ffffffff810ea421>] ? fallback_alloc+0x123/0x1c2
09:44:12   [<ffffffff8127daae>] ? pskb_expand_head+0xe0/0x24a
09:44:12   [<ffffffff810ea923>] ? __kmalloc+0xba/0x112
09:44:12   [<ffffffff8127daae>] ? pskb_expand_head+0xe0/0x24a
09:44:12   [<ffffffffa07a81c0>] ? ieee80211_skb_resize+0x64/0x9d [mac80211]
09:44:12   [<ffffffffa07aa087>] ? ieee80211_subif_start_xmit+0x68e/0x80c [mac80211]
09:44:12   [<ffffffffa07922c8>] ? ieee80211_tx_status_irqsafe+0x2e/0x7f [mac80211]
09:44:12   [<ffffffff812877f3>] ? dev_hard_start_xmit+0x3fc/0x543
09:44:12   [<ffffffff81070551>] ? arch_local_irq_save+0x11/0x17
09:44:12   [<ffffffff8129da01>] ? sch_direct_xmit+0x5e/0x12f
09:44:12   [<ffffffff8129dbc9>] ? __qdisc_run+0xf7/0x10f
09:44:12   [<ffffffff81282d56>] ? net_tx_action+0xf1/0x128
09:44:12   [<ffffffff8104bc88>] ? __do_softirq+0xb9/0x177
09:44:12   [<ffffffff81065863>] ? timekeeping_get_ns+0xd/0x2a
09:44:12   [<ffffffff81349dac>] ? call_softirq+0x1c/0x30
09:44:12   [<ffffffff8100f875>] ? do_softirq+0x3c/0x7b
09:44:12   [<ffffffff8104bef0>] ? irq_exit+0x3c/0x9a
09:44:12   [<ffffffff8100f5a5>] ? do_IRQ+0x82/0x98
09:44:12   [<ffffffff81342e2e>] ? common_interrupt+0x6e/0x6e
09:44:12   <EOI>  [<ffffffff81061e33>] ? hrtimer_cancel+0xc/0x16
09:44:12   [<ffffffff8110629a>] ? select_estimate_accuracy+0xcf/0xed
09:44:12   [<ffffffff8110629a>] ? select_estimate_accuracy+0xcf/0xed
09:44:12   [<ffffffff81106464>] ? do_select+0x13c/0x46a
09:44:12   [<ffffffff81034a8c>] ? walk_tg_tree_from+0x70/0x99
09:44:12   [<ffffffff811060ab>] ? poll_freewait+0x97/0x97
09:44:12   [<ffffffff8103a520>] ? update_curr+0xd0/0xf5
09:44:12   [<ffffffff8100d758>] ? __switch_to+0x1fc/0x20e
09:44:12   [<ffffffff8103ac9c>] ? dequeue_entity+0x13e/0x161
09:44:12   [<ffffffff8103891c>] ? finish_task_switch+0x88/0xb9
09:44:12   [<ffffffff813417fc>] ? __schedule+0x5ac/0x5c3
09:44:12   [<ffffffff8106c902>] ? get_futex_value_locked+0x2d/0x3d
09:44:12   [<ffffffff810349f1>] ? set_task_rq+0x23/0x35
09:44:12   [<ffffffff81035ef0>] ? cpumask_next+0x17/0x1a
09:44:12   [<ffffffff8104073f>] ? select_task_rq_fair+0x435/0x67e
09:44:12   [<ffffffff8101353d>] ? paravirt_read_tsc+0x5/0x8
09:44:12   [<ffffffff81013999>] ? native_sched_clock+0x27/0x2f
09:44:12   [<ffffffff810139a6>] ? sched_clock+0x5/0x8
09:44:12   [<ffffffff810634b0>] ? sched_clock_local+0xd/0x6f
09:44:12   [<ffffffff81026aa4>] ? _flat_send_IPI_mask+0x68/0x78
09:44:12   [<ffffffff8103f175>] ? try_to_wake_up+0x187/0x196
09:44:12   [<ffffffff81106907>] ? core_sys_select+0x175/0x21a
09:44:12   [<ffffffff8106e85a>] ? do_futex+0xab/0x7ed
09:44:12   [<ffffffff8104afef>] ? timespec_add_safe+0x32/0x63
09:44:12   [<ffffffff81028a44>] ? read_hpet+0xd/0x10
09:44:12   [<ffffffff81065863>] ? timekeeping_get_ns+0xd/0x2a
09:44:12   [<ffffffff81106a34>] ? sys_select+0x88/0xad
09:44:12   [<ffffffff81347b52>] ? system_call_fastpath+0x16/0x1b
09:44:12  Mem-Info:
09:44:12  Node 0 DMA per-cpu:
09:44:12  CPU    0: hi:    0, btch:   1 usd:   0
09:44:12  CPU    1: hi:    0, btch:   1 usd:   0
09:44:12  Node 0 DMA32 per-cpu:
09:44:12  CPU    0: hi:  186, btch:  31 usd:  96
09:44:12  CPU    1: hi:  186, btch:  31 usd:  61
09:44:12  Node 0 Normal per-cpu:
09:44:12  CPU    0: hi:  186, btch:  31 usd: 178
09:44:12  CPU    1: hi:  186, btch:  31 usd:  37
09:44:12  active_anon:992922 inactive_anon:263416 isolated_anon:0
09:44:12   active_file:59734 inactive_file:529995 isolated_file:0
09:44:12   unevictable:1 dirty:52839 writeback:339893 unstable:7544
09:44:12   free:55501 slab_reclaimable:33311 slab_unreclaimable:30518
09:44:12   mapped:18500 shmem:43519 pagetables:21479 bounce:0
09:44:12  Node 0 DMA free:15908kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
09:44:12  lowmem_reserve[]: 0 2960 7947 7947
09:44:12  Node 0 DMA32 free:153280kB min:25128kB low:31408kB high:37692kB active_anon:1579176kB inactive_anon:524080kB active_file:33708kB inactive_file:618072kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3031688kB mlocked:0kB dirty:81192kB writeback:322428kB mapped:6704kB shmem:67068kB slab_reclaimable:41408kB slab_unreclaimable:30268kB kernel_stack:2216kB pagetables:24408kB unstable:10048kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
09:44:12  lowmem_reserve[]: 0 0 4986 4986
09:44:12  Node 0 Normal free:52816kB min:42324kB low:52904kB high:63484kB active_anon:2392512kB inactive_anon:529584kB active_file:205228kB inactive_file:1501908kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:5106560kB mlocked:4kB dirty:130164kB writeback:1037144kB mapped:67296kB shmem:107008kB slab_reclaimable:91836kB slab_unreclaimable:91804kB kernel_stack:4216kB pagetables:61508kB unstable:20128kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
09:44:12  lowmem_reserve[]: 0 0 0 0
09:44:12  Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15908kB
09:44:12  Node 0 DMA32: 37156*4kB 0*8kB 1*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 153280kB
09:44:12  Node 0 Normal: 12180*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 52816kB
09:44:12  676507 total pagecache pages
09:44:12  43251 pages in swap cache
09:44:12  Swap cache stats: add 803319, delete 760068, find 297792/318561
09:44:12  Free swap  = 1763732kB
09:44:12  Total swap = 4106248kB
09:44:12  2080752 pages RAM
09:44:12  57163 pages reserved
09:44:12  988140 pages shared
09:44:12  1496507 pages non-shared
09:44:12  ieee80211 phy0: failed to reallocate TX buffer
09:44:12  kworker/0:1: page allocation failure: order:1, mode:0x20
09:44:12  Pid: 8162, comm: kworker/0:1 Tainted: G           O 3.2.5amd64-volpreempt-noide-20120208 #1
09:44:12  Call Trace:
09:44:12   <IRQ>  [<ffffffff810b755a>] ? warn_alloc_failed+0x11a/0x12d
09:44:12   [<ffffffff810ba394>] ? __alloc_pages_nodemask+0x6b2/0x726
09:44:12   [<ffffffff810e9081>] ? kmem_getpages+0x4c/0xd9
09:44:12   [<ffffffff810e9081>] ? kmem_getpages+0x4c/0xd9
09:44:12   [<ffffffff810ea421>] ? fallback_alloc+0x123/0x1c2
09:44:12   [<ffffffff8127daae>] ? pskb_expand_head+0xe0/0x24a
09:44:12   [<ffffffff810ea923>] ? __kmalloc+0xba/0x112
09:44:12   [<ffffffff8127daae>] ? pskb_expand_head+0xe0/0x24a
09:44:12   [<ffffffffa07a81c0>] ? ieee80211_skb_resize+0x64/0x9d [mac80211]
09:44:12   [<ffffffffa07aa087>] ? ieee80211_subif_start_xmit+0x68e/0x80c [mac80211]
09:44:12   [<ffffffffa07922c8>] ? ieee80211_tx_status_irqsafe+0x2e/0x7f [mac80211]
09:44:12   [<ffffffff812877f3>] ? dev_hard_start_xmit+0x3fc/0x543
09:44:12   [<ffffffff81070551>] ? arch_local_irq_save+0x11/0x17
09:44:12   [<ffffffff8129da01>] ? sch_direct_xmit+0x5e/0x12f
09:44:12   [<ffffffff8129dbc9>] ? __qdisc_run+0xf7/0x10f
09:44:12   [<ffffffff81282d56>] ? net_tx_action+0xf1/0x128
09:44:12   [<ffffffff8104bc88>] ? __do_softirq+0xb9/0x177
09:44:12   [<ffffffff8103a946>] ? check_preempt_wakeup+0x10c/0x18e
09:44:12   [<ffffffff81349dac>] ? call_softirq+0x1c/0x30
09:44:12   [<ffffffff8100f875>] ? do_softirq+0x3c/0x7b
09:44:12   [<ffffffff8104bef0>] ? irq_exit+0x3c/0x9a
09:44:12   [<ffffffff8100f5a5>] ? do_IRQ+0x82/0x98
09:44:12   [<ffffffff81342e2e>] ? common_interrupt+0x6e/0x6e
09:44:12   <EOI>  [<ffffffffa0089052>] ? dec128+0x15a/0x80c [aes_x86_64]
09:44:12   [<ffffffff8118f263>] ? crypto_cbc_decrypt_inplace.isra.4+0x84/0xef
09:44:12   [<ffffffffa06bb6ee>] ? decode_attr_time+0x2f/0x5d [nfs]
09:44:12   [<ffffffffa0089704>] ? dec128+0x80c/0x80c [aes_x86_64]
09:44:12   [<ffffffff8118f328>] ? crypto_cbc_decrypt+0x5a/0x10a
09:44:12   [<ffffffff8118756b>] ? async_decrypt+0x37/0x3c
09:44:12   [<ffffffffa0108df7>] ? crypt_convert+0x223/0x2b8 [dm_crypt]
09:44:12   [<ffffffffa0109185>] ? kcryptd_crypt+0x56/0x342 [dm_crypt]
09:44:12   [<ffffffffa010912f>] ? crypt_convert_init.isra.17+0x4f/0x4f [dm_crypt]
09:44:12   [<ffffffffa010912f>] ? crypt_convert_init.isra.17+0x4f/0x4f [dm_crypt]
09:44:12   [<ffffffffa010912f>] ? crypt_convert_init.isra.17+0x4f/0x4f [dm_crypt]
09:44:12   [<ffffffff8105ac3d>] ? process_one_work+0x163/0x284
09:44:12   [<ffffffff8105bc05>] ? worker_thread+0xc2/0x145
09:44:12   [<ffffffff8105bb43>] ? manage_workers.isra.23+0x15b/0x15b
09:44:12   [<ffffffff8105ed41>] ? kthread+0x76/0x7e
09:44:12   [<ffffffff81349cb4>] ? kernel_thread_helper+0x4/0x10
09:44:12   [<ffffffff8105eccb>] ? kthread_worker_fn+0x139/0x139
09:44:12   [<ffffffff81349cb0>] ? gs_change+0x13/0x13
09:44:12  Mem-Info:
09:44:12  Node 0 DMA per-cpu:
09:44:12  CPU    0: hi:    0, btch:   1 usd:   0
09:44:12  CPU    1: hi:    0, btch:   1 usd:   0
09:44:12  Node 0 DMA32 per-cpu:
09:44:12  CPU    0: hi:  186, btch:  31 usd:  77
09:44:12  CPU    1: hi:  186, btch:  31 usd:  53
09:44:12  Node 0 Normal per-cpu:
09:44:12  CPU    0: hi:  186, btch:  31 usd: 165
09:44:12  CPU    1: hi:  186, btch:  31 usd:  64
09:44:12  active_anon:992922 inactive_anon:263441 isolated_anon:0
09:44:12   active_file:59734 inactive_file:530249 isolated_file:0
09:44:12   unevictable:1 dirty:52950 writeback:339764 unstable:7673
09:44:12   free:55214 slab_reclaimable:33311 slab_unreclaimable:30507
09:44:12   mapped:18500 shmem:43519 pagetables:21479 bounce:0
09:44:12  Node 0 DMA free:15908kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
09:44:12  lowmem_reserve[]: 0 2960 7947 7947
09:44:12  Node 0 DMA32 free:152124kB min:25128kB low:31408kB high:37692kB active_anon:1579176kB inactive_anon:524180kB active_file:33708kB inactive_file:618972kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3031688kB mlocked:0kB dirty:81636kB writeback:322428kB mapped:6704kB shmem:67068kB slab_reclaimable:41408kB slab_unreclaimable:30456kB kernel_stack:2216kB pagetables:24408kB unstable:10048kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
09:44:12  lowmem_reserve[]: 0 0 4986 4986
09:44:12  Node 0 Normal free:52824kB min:42324kB low:52904kB high:63484kB active_anon:2392512kB inactive_anon:529584kB active_file:205228kB inactive_file:1502024kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:5106560kB mlocked:4kB dirty:130164kB writeback:1036628kB mapped:67296kB shmem:107008kB slab_reclaimable:91836kB slab_unreclaimable:91572kB kernel_stack:4216kB pagetables:61508kB unstable:20644kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
09:44:12  lowmem_reserve[]: 0 0 0 0
09:44:12  Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15908kB
09:44:12  Node 0 DMA32: 36855*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 152124kB
09:44:12  Node 0 Normal: 12182*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 52824kB
09:44:12  676809 total pagecache pages
09:44:12  43260 pages in swap cache
09:44:12  Swap cache stats: add 803337, delete 760077, find 297801/318574
09:44:12  Free swap  = 1763768kB
09:44:12  Total swap = 4106248kB
09:44:12  2080752 pages RAM
09:44:12  57163 pages reserved
09:44:12  988161 pages shared
09:44:12  1496613 pages non-shared
09:44:12  ieee80211 phy0: failed to reallocate TX buffer
09:44:13  firefox-bin: page allocation failure: order:1, mode:0x20
09:44:13  Pid: 6697, comm: firefox-bin Tainted: G           O 3.2.5amd64-volpreempt-noide-20120208 #1
09:44:13  Call Trace:
09:44:13   <IRQ>  [<ffffffff810b755a>] ? warn_alloc_failed+0x11a/0x12d
09:44:13   [<ffffffff810ba394>] ? __alloc_pages_nodemask+0x6b2/0x726
09:44:13   [<ffffffff810e9081>] ? kmem_getpages+0x4c/0xd9
09:44:13   [<ffffffff810e9081>] ? kmem_getpages+0x4c/0xd9
09:44:13   [<ffffffff810ea421>] ? fallback_alloc+0x123/0x1c2
09:44:13   [<ffffffff8127daae>] ? pskb_expand_head+0xe0/0x24a
09:44:13   [<ffffffff810ea923>] ? __kmalloc+0xba/0x112
09:44:13   [<ffffffff8127daae>] ? pskb_expand_head+0xe0/0x24a
09:44:13   [<ffffffffa07a81c0>] ? ieee80211_skb_resize+0x64/0x9d [mac80211]
09:44:13   [<ffffffffa07aa087>] ? ieee80211_subif_start_xmit+0x68e/0x80c [mac80211]
09:44:13   [<ffffffff812877f3>] ? dev_hard_start_xmit+0x3fc/0x543
09:44:13   [<ffffffff8129da01>] ? sch_direct_xmit+0x5e/0x12f
09:44:13   [<ffffffff8129dbc9>] ? __qdisc_run+0xf7/0x10f
09:44:13   [<ffffffff81282d56>] ? net_tx_action+0xf1/0x128
09:44:13   [<ffffffff8104bc88>] ? __do_softirq+0xb9/0x177
09:44:13   [<ffffffff8103a95f>] ? check_preempt_wakeup+0x125/0x18e
09:44:13   [<ffffffff81349dac>] ? call_softirq+0x1c/0x30
09:44:13   [<ffffffff8100f875>] ? do_softirq+0x3c/0x7b
09:44:13   [<ffffffff8104bef0>] ? irq_exit+0x3c/0x9a
09:44:13   [<ffffffff8100f5a5>] ? do_IRQ+0x82/0x98
09:44:13   [<ffffffff81342e2e>] ? common_interrupt+0x6e/0x6e
09:44:13   <EOI>  [<ffffffff81347c84>] ? sysret_audit+0x16/0x20
09:44:13  Mem-Info:
09:44:13  Node 0 DMA per-cpu:
09:44:13  CPU    0: hi:    0, btch:   1 usd:   0
09:44:13  CPU    1: hi:    0, btch:   1 usd:   0
09:44:13  Node 0 DMA32 per-cpu:
09:44:13  CPU    0: hi:  186, btch:  31 usd:  78
09:44:13  CPU    1: hi:  186, btch:  31 usd:  66
09:44:13  Node 0 Normal per-cpu:
09:44:13  CPU    0: hi:  186, btch:  31 usd: 165
09:44:13  CPU    1: hi:  186, btch:  31 usd:  70
09:44:13  active_anon:992947 inactive_anon:263499 isolated_anon:0
09:44:13   active_file:59734 inactive_file:535571 isolated_file:0
09:44:13   unevictable:1 dirty:56763 writeback:339764 unstable:7673
09:44:13   free:49597 slab_reclaimable:33311 slab_unreclaimable:30582
09:44:13   mapped:18500 shmem:43562 pagetables:21479 bounce:0
09:44:13  Node 0 DMA free:15908kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
09:44:13  lowmem_reserve[]: 0 2960 7947 7947
09:44:13  Node 0 DMA32 free:129656kB min:25128kB low:31408kB high:37692kB active_anon:1579276kB inactive_anon:524180kB active_file:33708kB inactive_file:640772kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3031688kB mlocked:0kB dirty:92588kB writeback:322428kB mapped:6704kB shmem:67068kB slab_reclaimable:41408kB slab_unreclaimable:30756kB kernel_stack:2216kB pagetables:24408kB unstable:10048kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
09:44:13  lowmem_reserve[]: 0 0 4986 4986
09:44:13  Node 0 Normal free:52824kB min:42324kB low:52904kB high:63484kB active_anon:2392512kB inactive_anon:529816kB active_file:205228kB inactive_file:1501512kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:5106560kB mlocked:4kB dirty:134464kB writeback:1036628kB mapped:67296kB shmem:107180kB slab_reclaimable:91836kB slab_unreclaimable:91572kB kernel_stack:4216kB pagetables:61508kB unstable:20644kB bounce:0kB writeback_tmp:0kB pages_scanned:1703 all_unreclaimable? no
09:44:13  lowmem_reserve[]: 0 0 0 0
09:44:13  Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15908kB
09:44:13  Node 0 DMA32: 31254*4kB 0*8kB 0*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 129656kB
09:44:13  Node 0 Normal: 12182*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 52824kB
09:44:13  682242 total pagecache pages
09:44:13  43259 pages in swap cache
09:44:13  Swap cache stats: add 803338, delete 760079, find 297802/318575
09:44:13  Free swap  = 1763768kB
09:44:13  Total swap = 4106248kB
09:44:13  2080752 pages RAM
09:44:13  57163 pages reserved
09:44:13  992010 pages shared
09:44:13  1498343 pages non-shared
09:44:13  ieee80211 phy0: failed to reallocate TX buffer
09:44:13  firefox-bin: page allocation failure: order:1, mode:0x20
09:44:13  Pid: 6706, comm: firefox-bin Tainted: G           O 3.2.5amd64-volpreempt-noide-20120208 #1
09:44:13  Call Trace:
09:44:13   <IRQ>  [<ffffffff810b755a>] ? warn_alloc_failed+0x11a/0x12d
09:44:13   [<ffffffff810ba394>] ? __alloc_pages_nodemask+0x6b2/0x726
09:44:13   [<ffffffff810e9081>] ? kmem_getpages+0x4c/0xd9
09:44:13   [<ffffffff810e9081>] ? kmem_getpages+0x4c/0xd9
09:44:13   [<ffffffff810ea421>] ? fallback_alloc+0x123/0x1c2
09:44:13   [<ffffffff8127daae>] ? pskb_expand_head+0xe0/0x24a
09:44:13   [<ffffffff810ea923>] ? __kmalloc+0xba/0x112
09:44:13   [<ffffffff8127daae>] ? pskb_expand_head+0xe0/0x24a
09:44:13   [<ffffffffa07a81c0>] ? ieee80211_skb_resize+0x64/0x9d [mac80211]
09:44:13   [<ffffffffa07aa087>] ? ieee80211_subif_start_xmit+0x68e/0x80c [mac80211]
09:44:13   [<ffffffffa07922c8>] ? ieee80211_tx_status_irqsafe+0x2e/0x7f [mac80211]
09:44:13   [<ffffffff812877f3>] ? dev_hard_start_xmit+0x3fc/0x543
09:44:13   [<ffffffff81070551>] ? arch_local_irq_save+0x11/0x17
09:44:13   [<ffffffff8129da01>] ? sch_direct_xmit+0x5e/0x12f
09:44:13   [<ffffffff8129dbc9>] ? __qdisc_run+0xf7/0x10f
09:44:13   [<ffffffff81282d56>] ? net_tx_action+0xf1/0x128
09:44:13   [<ffffffff8104bc88>] ? __do_softirq+0xb9/0x177
09:44:13   [<ffffffff8106a67b>] ? clockevents_program_event+0xaa/0xce
09:44:13   [<ffffffff81349dac>] ? call_softirq+0x1c/0x30
09:44:13   [<ffffffff8100f875>] ? do_softirq+0x3c/0x7b
09:44:13   [<ffffffff8104bef0>] ? irq_exit+0x3c/0x9a
09:44:13   [<ffffffff8100f5a5>] ? do_IRQ+0x82/0x98
09:44:13   [<ffffffff81342e2e>] ? common_interrupt+0x6e/0x6e
09:44:13   <EOI>  [<ffffffff81347c84>] ? sysret_audit+0x16/0x20
09:44:13  Mem-Info:
09:44:13  Node 0 DMA per-cpu:
09:44:13  CPU    0: hi:    0, btch:   1 usd:   0
09:44:13  CPU    1: hi:    0, btch:   1 usd:   0
09:44:13  Node 0 DMA32 per-cpu:
09:44:13  CPU    0: hi:  186, btch:  31 usd:  69
09:44:13  CPU    1: hi:  186, btch:  31 usd:  35
09:44:13  Node 0 Normal per-cpu:
09:44:13  CPU    0: hi:  186, btch:  31 usd: 178
09:44:13  CPU    1: hi:  186, btch:  31 usd:  68
09:44:13  active_anon:992947 inactive_anon:263499 isolated_anon:0
09:44:13   active_file:59734 inactive_file:536198 isolated_file:0
09:44:13   unevictable:1 dirty:57182 writeback:339764 unstable:7673
09:44:13   free:49028 slab_reclaimable:33311 slab_unreclaimable:30581
09:44:13   mapped:18500 shmem:43562 pagetables:21479 bounce:0
09:44:13  Node 0 DMA free:15908kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
09:44:13  lowmem_reserve[]: 0 2960 7947 7947
09:44:13  Node 0 DMA32 free:127380kB min:25128kB low:31408kB high:37692kB active_anon:1579276kB inactive_anon:524180kB active_file:33708kB inactive_file:643272kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3031688kB mlocked:0kB dirty:93920kB writeback:322428kB mapped:6704kB shmem:67068kB slab_reclaimable:41408kB slab_unreclaimable:30752kB kernel_stack:2216kB pagetables:24408kB unstable:10048kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
09:44:13  lowmem_reserve[]: 0 0 4986 4986
09:44:13  Node 0 Normal free:52824kB min:42324kB low:52904kB high:63484kB active_anon:2392512kB inactive_anon:529816kB active_file:205228kB inactive_file:1501520kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:5106560kB mlocked:4kB dirty:134808kB writeback:1036628kB mapped:67296kB shmem:107180kB slab_reclaimable:91836kB slab_unreclaimable:91572kB kernel_stack:4216kB pagetables:61508kB unstable:20644kB bounce:0kB writeback_tmp:0kB pages_scanned:112 all_unreclaimable? no
09:44:13  lowmem_reserve[]: 0 0 0 0
09:44:13  Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15908kB
09:44:13  Node 0 DMA32: 30640*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 127256kB
09:44:13  Node 0 Normal: 12182*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 52824kB
09:44:13  682828 total pagecache pages
09:44:13  43261 pages in swap cache
09:44:13  Swap cache stats: add 803340, delete 760079, find 297802/318575
09:44:13  Free swap  = 1763760kB
09:44:13  Total swap = 4106248kB
09:44:13  2080752 pages RAM
09:44:13  57163 pages reserved
09:44:13  992430 pages shared
09:44:13  1498647 pages non-shared
09:44:13  ieee80211 phy0: failed to reallocate TX buffer
09:44:13  kworker/0:0: page allocation failure: order:1, mode:0x20
09:44:13  Pid: 7949, comm: kworker/0:0 Tainted: G           O 3.2.5amd64-volpreempt-noide-20120208 #1
09:44:13  Call Trace:
09:44:13   <IRQ>  [<ffffffff810b755a>] ? warn_alloc_failed+0x11a/0x12d
09:44:13   [<ffffffff810ba394>] ? __alloc_pages_nodemask+0x6b2/0x726
09:44:13   [<ffffffff810e9081>] ? kmem_getpages+0x4c/0xd9
09:44:13   [<ffffffff810e9081>] ? kmem_getpages+0x4c/0xd9
09:44:13   [<ffffffff810ea421>] ? fallback_alloc+0x123/0x1c2
09:44:13   [<ffffffff8127daae>] ? pskb_expand_head+0xe0/0x24a
09:44:13   [<ffffffff810ea923>] ? __kmalloc+0xba/0x112
09:44:13   [<ffffffff8127daae>] ? pskb_expand_head+0xe0/0x24a
09:44:13   [<ffffffffa07a81c0>] ? ieee80211_skb_resize+0x64/0x9d [mac80211]
09:44:13   [<ffffffffa07aa087>] ? ieee80211_subif_start_xmit+0x68e/0x80c [mac80211]
09:44:13   [<ffffffffa07922c8>] ? ieee80211_tx_status_irqsafe+0x2e/0x7f [mac80211]
09:44:13   [<ffffffff812877f3>] ? dev_hard_start_xmit+0x3fc/0x543
09:44:13   [<ffffffff8129da01>] ? sch_d9704>] ? dec128+0x80c/0x80c [aes_x86_64]
09:44:13   [<ffffffff810349f1>] ? set_task_rq+0x23/0x35
09:44:13   [<ffffffff8118f328>] ? crypto_cbc_decrypt+0x5a/0x10a
09:44:13   [<ffffffff8118756b>] ? async_decrypt+0x37/0x3c
09:44:13   [<ffffffffa0108df7>] ? crypt_convert+0x223/0x2b8 [dm_crypt]
09:44:13   [<ffffffffa0109185>] ? kcryptd_crypt+0x56/0x342 [dm_crypt]
09:44:13   [<ffffffff810388e2>] ? finish_task_switch+0x4e/0xb9
09:44:13   [<ffffffff811a4021>] ? cfq_init_queue+0x403/0x403
09:44:13   [<ffffffffa010912f>] ? crypt_convert_init.isra.17+0x4f/0x4f [dm_crypt]
09:44:13   [<ffffffff8105ac3d>] ? process_one_work+0x163/0x284
09:44:13   [<ffffffff8105bc05>] ? worker_thread+0xc2/0x145
09:44:13   [<ffffffff8105bb43>] ? manage_workers.isra.23+0x15b/0x15b
09:44:13   [<ffffffff8105ed41>] ? kthread+0x76/0x7e
09:44:13   [<ffffffff81349cb4>] ? kernel_thread_helper+0x4/0x10
09:44:13   [<ffffffff8105eccb>] ? kthread_worker_fn+0x139/0x139
09:44:13   [<ffffffff81349cb0>] ? gs_change+0x13/0x13
09:44:13  Mem-Info:
09:44:13  Node 0 DMA per-cpu:
09:44:13  CPU    0: hi:    0, btch:   1 usd:   0
09:44:13  CPU    1: hi:    0, btch:   1 usd:   0
09:44:13  Node 0 DMA32 per-cpu:
09:44:13  CPU    0: hi:  186, btch:  31 usd:  17
09:44:13  CPU    1: hi:  186, btch:  31 usd:   0
09:44:13  Node 0 Normal per-cpu:
09:44:13  CPU    0: hi:  186, btch:  31 usd:  69
09:44:13  CPU    1: hi:  186, btch:  31 usd: 125
09:44:13  active_anon:992968 inactive_anon:263499 isolated_anon:1
09:44:13   active_file:59749 inactive_file:538771 isolated_file:12
09:44:13   unevictable:1 dirty:59393 writeback:339244 unstable:8193
09:44:13   free:46512 slab_reclaimable:33220 slab_unreclaimable:30692
09:44:13   mapped:18504 shmem:43598 pagetables:21479 bounce:0
09:44:13  Node 0 DMA free:15908kB min:128kB low:160kB high:192kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
09:44:13  lowmem_reserve[]: 0 2960 7947 7947
09:44:13  Node 0 DMA32 free:117308kB min:25128kB low:31408kB high:37692kB active_anon:1579336kB inactive_anon:524204kB active_file:33764kB inactive_file:653796kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3031688kB mlocked:0kB dirty:99216kB writeback:322356kB mapped:6716kB shmem:67128kB slab_reclaimable:41324kB slab_unreclaimable:30888kB kernel_stack:2216kB pagetables:24408kB unstable:10120kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
09:44:13  lowmem_reserve[]: 0 0 4986 4986
09:44:13  Node 0 Normal free:52832kB min:42324kB low:52904kB high:63484kB active_anon:2392536kB inactive_anon:529792kB active_file:205232kB inactive_file:1501288kB unevictable:4kB isolated(anon):4kB isolated(file):48kB present:5106560kB mlocked:4kB dirty:138356kB writeback:1034620kB mapped:67300kB shmem:107264kB slab_reclaimable:91556kB slab_unreclaimable:91880kB kernel_stack:4216kB pagetables:61508kB unstable:22652kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
09:44:13  lowmem_reserve[]: 0 0 0 0
09:44:13  Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15908kB
09:44:13  Node 0 DMA32: 28147*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 117308kB
09:44:13  Node 0 Normal: 12094*4kB 5*8kB 12*16kB 4*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 52832kB
09:44:13  685431 total pagecache pages
09:44:13  43261 pages in swap cache
09:44:13  Swap cache stats: add 803341, delete 760080, find 297802/318575
09:44:13  Free swap  = 1763756kB
09:44:13  Total swap = 4106248kB
09:44:13  2080752 pages RAM
09:44:13  57163 pages reserved
09:44:13  994159 pages shared
09:44:13  1498949 pages non-shared
09:44:13  ieee80211 phy0: failed to reallocate TX buffer
09:44:13  kworker/0:0: page allocation failure: order:1, mode:0x20

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-04-09 17:20   ` Marc MERLIN
@ 2012-04-09 18:12     ` David Miller
  2012-04-09 18:36         ` Marc MERLIN
  0 siblings, 1 reply; 72+ messages in thread
From: David Miller @ 2012-04-09 18:12 UTC (permalink / raw)
  To: marc; +Cc: bhutchings, linux-wireless, netdev

From: Marc MERLIN <marc@merlins.org>
Date: Mon, 9 Apr 2012 10:20:51 -0700

> Any idea what's going on, what bucket of RAM is an issue for the driver
> (total RAM is more than plenty), and where I should go from there?

The wireless layer is allocating high-order pages, so it's
not the amount of ram, it's the fragmentation of it leading
to a lack of those high-order pages.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
@ 2012-04-09 18:36         ` Marc MERLIN
  0 siblings, 0 replies; 72+ messages in thread
From: Marc MERLIN @ 2012-04-09 18:36 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, linux-wireless, netdev

On Mon, Apr 09, 2012 at 02:12:41PM -0400, David Miller wrote:
> From: Marc MERLIN <marc@merlins.org>
> Date: Mon, 9 Apr 2012 10:20:51 -0700
> 
> > Any idea what's going on, what bucket of RAM is an issue for the driver
> > (total RAM is more than plenty), and where I should go from there?
> 
> The wireless layer is allocating high-order pages, so it's
> not the amount of ram, it's the fragmentation of it leading
> to a lack of those high-order pages.

I figured it was something of the sort, thanks for the professional
confirmation :)

What's my next step, file a bug with a specific team?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
@ 2012-04-09 18:36         ` Marc MERLIN
  0 siblings, 0 replies; 72+ messages in thread
From: Marc MERLIN @ 2012-04-09 18:36 UTC (permalink / raw)
  To: David Miller
  Cc: bhutchings-s/n/eUQHGBpZroRs9YW3xA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

On Mon, Apr 09, 2012 at 02:12:41PM -0400, David Miller wrote:
> From: Marc MERLIN <marc-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
> Date: Mon, 9 Apr 2012 10:20:51 -0700
> 
> > Any idea what's going on, what bucket of RAM is an issue for the driver
> > (total RAM is more than plenty), and where I should go from there?
> 
> The wireless layer is allocating high-order pages, so it's
> not the amount of ram, it's the fragmentation of it leading
> to a lack of those high-order pages.

I figured it was something of the sort, thanks for the professional
confirmation :)

What's my next step, file a bug with a specific team?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-04-09 18:36         ` Marc MERLIN
  (?)
@ 2012-04-09 18:37         ` David Miller
  2012-04-09 18:58           ` Larry Finger
  -1 siblings, 1 reply; 72+ messages in thread
From: David Miller @ 2012-04-09 18:37 UTC (permalink / raw)
  To: marc; +Cc: bhutchings, linux-wireless, netdev

From: Marc MERLIN <marc@merlins.org>
Date: Mon, 9 Apr 2012 11:36:32 -0700

> On Mon, Apr 09, 2012 at 02:12:41PM -0400, David Miller wrote:
>> From: Marc MERLIN <marc@merlins.org>
>> Date: Mon, 9 Apr 2012 10:20:51 -0700
>> 
>> > Any idea what's going on, what bucket of RAM is an issue for the driver
>> > (total RAM is more than plenty), and where I should go from there?
>> 
>> The wireless layer is allocating high-order pages, so it's
>> not the amount of ram, it's the fragmentation of it leading
>> to a lack of those high-order pages.
> 
> I figured it was something of the sort, thanks for the professional
> confirmation :)
> 
> What's my next step, file a bug with a specific team?

Wait patiently for a wireless developer to look into your bug.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-04-09 18:37         ` David Miller
@ 2012-04-09 18:58           ` Larry Finger
  2012-04-09 19:11             ` Eric Dumazet
  0 siblings, 1 reply; 72+ messages in thread
From: Larry Finger @ 2012-04-09 18:58 UTC (permalink / raw)
  To: marc; +Cc: David Miller, bhutchings, linux-wireless, netdev

On 04/09/2012 01:37 PM, David Miller wrote:
> From: Marc MERLIN<marc@merlins.org>
> Date: Mon, 9 Apr 2012 11:36:32 -0700
>
>> On Mon, Apr 09, 2012 at 02:12:41PM -0400, David Miller wrote:
>>> From: Marc MERLIN<marc@merlins.org>
>>> Date: Mon, 9 Apr 2012 10:20:51 -0700
>>>
>>>> Any idea what's going on, what bucket of RAM is an issue for the driver
>>>> (total RAM is more than plenty), and where I should go from there?
>>>
>>> The wireless layer is allocating high-order pages, so it's
>>> not the amount of ram, it's the fragmentation of it leading
>>> to a lack of those high-order pages.
>>
>> I figured it was something of the sort, thanks for the professional
>> confirmation :)
>>
>> What's my next step, file a bug with a specific team?
>
> Wait patiently for a wireless developer to look into your bug.

As it happens with both iwlwifi and e1000e, it seems to be a problem further up 
the food chain.

I don't know much about iwlwifi, but loading it with the module parameter 
"amsdu_size_8K=0" seems to select 4K rather than 8K buffers. That will hurt 
performance, but it should fix the memory fragmentation. There have also been 
some problems with aggregation that are fixed by setting the option "11n_disable=3".

Larry

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-04-09 18:58           ` Larry Finger
@ 2012-04-09 19:11             ` Eric Dumazet
  2012-04-09 19:34               ` David Miller
  0 siblings, 1 reply; 72+ messages in thread
From: Eric Dumazet @ 2012-04-09 19:11 UTC (permalink / raw)
  To: Larry Finger; +Cc: marc, David Miller, bhutchings, linux-wireless, netdev

On Mon, 2012-04-09 at 13:58 -0500, Larry Finger wrote:

> As it happens with both iwlwifi and e1000e, it seems to be a problem further up 
> the food chain.
> 
> I don't know much about iwlwifi, but loading it with the module parameter 
> "amsdu_size_8K=0" seems to select 4K rather than 8K buffers. That will hurt 
> performance, but it should fix the memory fragmentation. There have also been 
> some problems with aggregation that are fixed by setting the option "11n_disable=3".

I think Marc posted stack traces showing problem on transmit side.

09:44:12   [<ffffffff810ba394>] ? __alloc_pages_nodemask+0x6b2/0x726
09:44:12   [<ffffffff810e9081>] ? kmem_getpages+0x4c/0xd9
09:44:12   [<ffffffff810e9081>] ? kmem_getpages+0x4c/0xd9
09:44:12   [<ffffffff810ea421>] ? fallback_alloc+0x123/0x1c2
09:44:12   [<ffffffff8127daae>] ? pskb_expand_head+0xe0/0x24a
09:44:12   [<ffffffff810ea923>] ? __kmalloc+0xba/0x112
09:44:12   [<ffffffff8127daae>] ? pskb_expand_head+0xe0/0x24a
09:44:12   [<ffffffffa07a81c0>] ? ieee80211_skb_resize+0x64/0x9d [mac80211]
09:44:12   [<ffffffffa07aa087>] ? ieee80211_subif_start_xmit+0x68e/0x80c [mac80211]
09:44:12   [<ffffffffa07922c8>] ? ieee80211_tx_status_irqsafe+0x2e/0x7f [mac80211]
09:44:12   [<ffffffff812877f3>] ? dev_hard_start_xmit+0x3fc/0x543
09:44:12   [<ffffffff81070551>] ? arch_local_irq_save+0x11/0x17
09:44:12   [<ffffffff8129da01>] ? sch_direct_xmit+0x5e/0x12f
09:44:12   [<ffffffff8129dbc9>] ? __qdisc_run+0xf7/0x10f

I dont really understand how it can happen, with MTU=1500




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-04-09 19:11             ` Eric Dumazet
@ 2012-04-09 19:34               ` David Miller
  2012-04-09 19:46                   ` Marc MERLIN
  2012-04-10  3:56                 ` Eric Dumazet
  0 siblings, 2 replies; 72+ messages in thread
From: David Miller @ 2012-04-09 19:34 UTC (permalink / raw)
  To: eric.dumazet; +Cc: Larry.Finger, marc, bhutchings, linux-wireless, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 09 Apr 2012 21:11:12 +0200

> I think Marc posted stack traces showing problem on transmit side.
 ...
> I dont really understand how it can happen, with MTU=1500

Depending upon the configuration and the driver, wireless can need
more headroom.  For encryption an extra 8 bytes are necessary, and the
driver may request a variable amount of extra headroom via
->hw.extra_tx_headroom

What wireless device are we dealing with again?

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
@ 2012-04-09 19:46                   ` Marc MERLIN
  0 siblings, 0 replies; 72+ messages in thread
From: Marc MERLIN @ 2012-04-09 19:46 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, Larry.Finger, bhutchings, linux-wireless, netdev

On Mon, Apr 09, 2012 at 03:34:52PM -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 09 Apr 2012 21:11:12 +0200
> 
> > I think Marc posted stack traces showing problem on transmit side.
>  ...
> > I dont really understand how it can happen, with MTU=1500
> 
> Depending upon the configuration and the driver, wireless can need
> more headroom.  For encryption an extra 8 bytes are necessary, and the
> driver may request a variable amount of extra headroom via
> ->hw.extra_tx_headroom
> 
> What wireless device are we dealing with again?

Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:
Copyright(c) 2003-2011 Intel Corporation
iwlwifi 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwlwifi 0000:03:00.0: setting latency timer to 64
iwlwifi 0000:03:00.0: pci_resource_len = 0x00002000
iwlwifi 0000:03:00.0: pci_resource_base = ffffc900057a4000
iwlwifi 0000:03:00.0: HW Revision ID = 0x0
iwlwifi 0000:03:00.0: irq 47 for MSI/MSI-X
iwlwifi 0000:03:00.0: Detected Intel(R) Ultimate N WiFi Link 5300 AGN, REV=0x24
iwlwifi 0000:03:00.0: L1 Disabled; Enabling L0S
iwlwifi 0000:03:00.0: device EEPROM VER=0x11e, CALIB=0x4
iwlwifi 0000:03:00.0: Device SKU: 0Xf0
iwlwifi 0000:03:00.0: Tunable channels: 13 802.11bg, 24 802.11a channels

along with iwlwifi-5000-5.ucode

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
@ 2012-04-09 19:46                   ` Marc MERLIN
  0 siblings, 0 replies; 72+ messages in thread
From: Marc MERLIN @ 2012-04-09 19:46 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ,
	bhutchings-s/n/eUQHGBpZroRs9YW3xA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

On Mon, Apr 09, 2012 at 03:34:52PM -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date: Mon, 09 Apr 2012 21:11:12 +0200
> 
> > I think Marc posted stack traces showing problem on transmit side.
>  ...
> > I dont really understand how it can happen, with MTU=1500
> 
> Depending upon the configuration and the driver, wireless can need
> more headroom.  For encryption an extra 8 bytes are necessary, and the
> driver may request a variable amount of extra headroom via
> ->hw.extra_tx_headroom
> 
> What wireless device are we dealing with again?

Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:
Copyright(c) 2003-2011 Intel Corporation
iwlwifi 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
iwlwifi 0000:03:00.0: setting latency timer to 64
iwlwifi 0000:03:00.0: pci_resource_len = 0x00002000
iwlwifi 0000:03:00.0: pci_resource_base = ffffc900057a4000
iwlwifi 0000:03:00.0: HW Revision ID = 0x0
iwlwifi 0000:03:00.0: irq 47 for MSI/MSI-X
iwlwifi 0000:03:00.0: Detected Intel(R) Ultimate N WiFi Link 5300 AGN, REV=0x24
iwlwifi 0000:03:00.0: L1 Disabled; Enabling L0S
iwlwifi 0000:03:00.0: device EEPROM VER=0x11e, CALIB=0x4
iwlwifi 0000:03:00.0: Device SKU: 0Xf0
iwlwifi 0000:03:00.0: Tunable channels: 13 802.11bg, 24 802.11a channels

along with iwlwifi-5000-5.ucode

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-04-09 19:34               ` David Miller
  2012-04-09 19:46                   ` Marc MERLIN
@ 2012-04-10  3:56                 ` Eric Dumazet
  2012-04-10  5:11                   ` Marc MERLIN
  1 sibling, 1 reply; 72+ messages in thread
From: Eric Dumazet @ 2012-04-10  3:56 UTC (permalink / raw)
  To: David Miller; +Cc: Larry.Finger, marc, bhutchings, linux-wireless, netdev

On Mon, 2012-04-09 at 15:34 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 09 Apr 2012 21:11:12 +0200
> 
> > I think Marc posted stack traces showing problem on transmit side.
>  ...
> > I dont really understand how it can happen, with MTU=1500
> 
> Depending upon the configuration and the driver, wireless can need
> more headroom.  For encryption an extra 8 bytes are necessary, and the
> driver may request a variable amount of extra headroom via
> ->hw.extra_tx_headroom
> 
> What wireless device are we dealing with again?

Problem seems related to tailroom needed by mac80211
(IEEE80211_ENCRYPT_TAILROOM = 18 bytes)

So we must reallocate skb->head, thats impressive nobody cares.

[ 3007.249687] ieee80211_skb_resize(skb=ffff8802329846e8) cloned=1 head_need=0 tail_need=18 skb->len=1494 ksize=4096 tailroom=0 headroom=2282
[ 3007.249693] ieee80211_skb_resize(skb=ffff8802329846e8) cloned=0 head_need=0 tail_need=0 skb->len=1526 ksize=8192 tailroom=64 headroom=2250

Ouch... skb_tailroom() seems wrong ... it seems pskb_expand_head() is really suboptimal.

It appears tcp_sendmsg() tries to fill skb completely, with no available tailroom :

                        if (skb_tailroom(skb) > 0) {
                                /* We have some space in skb head. Superb! */
                                if (copy > skb_tailroom(skb))
                                        copy = skb_tailroom(skb);
                                err = skb_add_data_nocache(sk, skb, from, copy);
                                if (err)
                                        goto do_fault;
                        } else {


Shouldnt we take into account dev->needed_tailroom ?

I'll submit a pskb_expand_head() fix asap.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-04-10  3:56                 ` Eric Dumazet
@ 2012-04-10  5:11                   ` Marc MERLIN
  2012-04-10  6:11                     ` Eric Dumazet
  0 siblings, 1 reply; 72+ messages in thread
From: Marc MERLIN @ 2012-04-10  5:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

On Tue, Apr 10, 2012 at 05:56:20AM +0200, Eric Dumazet wrote:
> > What wireless device are we dealing with again?
> 
> Problem seems related to tailroom needed by mac80211
> (IEEE80211_ENCRYPT_TAILROOM = 18 bytes)
> 
> So we must reallocate skb->head, thats impressive nobody cares.
> 
> [ 3007.249687] ieee80211_skb_resize(skb=ffff8802329846e8) cloned=1 head_need=0 tail_need=18 skb->len=1494 ksize=4096 tailroom=0 headroom=2282
> [ 3007.249693] ieee80211_skb_resize(skb=ffff8802329846e8) cloned=0 head_need=0 tail_need=0 skb->len=1526 ksize=8192 tailroom=64 headroom=2250
> 
> Ouch... skb_tailroom() seems wrong ... it seems pskb_expand_head() is really suboptimal.
> 
> It appears tcp_sendmsg() tries to fill skb completely, with no available tailroom :
> 
>                         if (skb_tailroom(skb) > 0) {
>                                 /* We have some space in skb head. Superb! */
>                                 if (copy > skb_tailroom(skb))
>                                         copy = skb_tailroom(skb);
>                                 err = skb_add_data_nocache(sk, skb, from, copy);
>                                 if (err)
>                                         goto do_fault;
>                         } else {
> 
> Shouldnt we take into account dev->needed_tailroom ?
> 
> I'll submit a pskb_expand_head() fix asap.

Thanks for finding this.

To answer an earlier question, I tried the non wireless case too.

The problem is harder to reproduce over e1000e though, I just got two short
hangs where my mouse cursor was hung for 5-10 seconds, but nothing in
syslog/dmesg this time.

I'm pretty sure this older log below did happen on e1000e with wireless disabled
though (but it had a taint 'O'):

If that helps, my earlier message had the traces below.

I can report back when you have a patch you'd like me to try out.

Thanks again,
Marc


> [28451.191115] WorkerPool/1248 D ffff88013bc93580     0 12483   3740 0x00000080
> [28451.191115]  ffff8801189ba100 0000000000000082 0000000000000000 ffff880134f2e180
> [28451.191115]  0000000000013580 ffff88001614bfd8 ffff88001614bfd8 ffff8801189ba100
> [28451.191115]  ffffffff811b4b62 000000010164525a 0000000000000046 ffffffff8165a250
> [28451.191115] Call Trace:
> [28451.191115]  [<ffffffff811b4b62>] ? sha_transform+0x395/0x1209
> [28451.191115]  [<ffffffff8134a9b4>] ? __mutex_lock_common.isra.6+0x13d/0x219
> [28451.191115]  [<ffffffff81242714>] ? extract_buf+0x86/0xf2
> [28451.191115]  [<ffffffff8134a7e6>] ? mutex_lock+0xf/0x1f
> [28451.191115]  [<ffffffff81298979>] ? rtnetlink_rcv+0xe/0x28
> [28451.191115]  [<ffffffff812ad007>] ? netlink_unicast+0xe6/0x14e
> [28451.191115]  [<ffffffff812ad26b>] ? netlink_sendmsg+0x1fc/0x237
> [28451.191115]  [<ffffffff8127c770>] ? sock_sendmsg+0xc1/0xde
> [28451.191115]  [<ffffffff810eca23>] ? __cache_free.isra.40+0x19/0x1a7
> [28451.191115]  [<ffffffff813496be>] ? nl_pid_hash_rehash+0xc8/0xef
> [28451.191115]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
> [28451.191115]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
> [28451.191115]  [<ffffffff8134e1d2>] ? sub_preempt_count+0x83/0x94
> [28451.191115]  [<ffffffff810fd81e>] ? fget_light+0x85/0x8d
> [28451.191115]  [<ffffffff8127e0e3>] ? sys_sendto+0xf7/0x137
> [28451.191115]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
> [28451.191115]  [<ffffffff8134e1d2>] ? sub_preempt_count+0x83/0x94
> [28451.191115]  [<ffffffff8134b725>] ? _raw_spin_unlock+0x24/0x30
> [28451.191115]  [<ffffffff8108d73e>] ? audit_syscall_entry+0x105/0x130
> [28451.191115]  [<ffffffff8134fd52>] ? system_call_fastpath+0x16/0x1b
> 
> 
> 
> Below are lines I got in syslog during the copy.
> Highlight is:
> [ 4437.367046] kworker/1:1: page allocation failure: order:1, mode:0x20
> and then:
> [ 8640.516177] INFO: task flush-0:37:7122 blocked for more than 120 seconds.
> and then 120,000 lines(!) of:
> [ 9654.042164] ieee80211 phy0: failed to reallocate TX buffer
> 
> unedited lines below.
> 
> So, any idea of what I can try next?
> 
> Thanks,
> Marc
> 
> 
> [ 4437.367046] kworker/1:1: page allocation failure: order:1, mode:0x20
> [ 4437.367053] Pid: 8067, comm: kworker/1:1 Tainted: G           O 3.2.8-amd64-volpreempt-noide-20120208 #1
> [ 4437.367056] Call Trace:
> [ 4437.367058]  <IRQ>  [<ffffffff810b9ec0>] ? warn_alloc_failed+0x11f/0x132
> [ 4437.367074]  [<ffffffff810bcdaa>] ? __alloc_pages_nodemask+0x6b1/0x72f
> [ 4437.367081]  [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
> [ 4437.367086]  [<ffffffff810ec911>] ? kmem_getpages+0x4c/0xd9
> [ 4437.367090]  [<ffffffff810edd21>] ? fallback_alloc+0x123/0x1c2
> [ 4437.367096]  [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
> [ 4437.367101]  [<ffffffff810ee215>] ? __kmalloc+0xb2/0x10a
> [ 4437.367105]  [<ffffffff812846db>] ? pskb_expand_head+0xe0/0x24a
> [ 4437.367139]  [<ffffffffa03e22c1>] ? ieee80211_skb_resize+0x64/0x9d [mac80211]
> [ 4437.367154]  [<ffffffffa03e4252>] ? ieee80211_subif_start_xmit+0x705/0x883 [mac80211]
> [ 4437.367175]  [<ffffffff8128e767>] ? dev_hard_start_xmit+0x40b/0x552
> [ 4437.367179]  [<ffffffff812a4adc>] ? sch_direct_xmit+0x63/0x13a
> [ 4437.367182]  [<ffffffff8128eb8e>] ? dev_queue_xmit+0x2e0/0x4b5
> [ 4437.367185]  [<ffffffff812b764d>] ? ip_finish_output2+0x1c7/0x218
> [ 4437.367188]  [<ffffffff812b86aa>] ? __ip_flush_pending_frames.isra.29+0x69/0x69
> [ 4437.367191]  [<ffffffff812b8a6a>] ? ip_queue_xmit+0x2cd/0x30d
> [ 4437.367195]  [<ffffffff81066be9>] ? getnstimeofday+0x4a/0x7b
> [ 4437.367198]  [<ffffffff812ca1d2>] ? tcp_transmit_skb+0x6d7/0x70a
> [ 4437.367201]  [<ffffffff812cac5f>] ? tcp_write_xmit+0x698/0x7a1
> [ 4437.367204]  [<ffffffff812c77bf>] ? tcp_ack+0x14e3/0x1658
> [ 4437.367207]  [<ffffffff812c89bd>] ? tcp_established_options+0x2b/0x9e
> [ 4437.367210]  [<ffffffff812cada9>] ? __tcp_push_pending_frames+0x18/0x44
> [ 4437.367213]  [<ffffffff812c4e27>] ? tcp_data_snd_check+0x2c/0xfd
> [ 4437.367216]  [<ffffffff812c86c5>] ? tcp_rcv_established+0x4f0/0x549
> [ 4437.367220]  [<ffffffff8103ec39>] ? select_task_rq_fair+0x67b/0x690
> [ 4437.367223]  [<ffffffff812ce735>] ? tcp_v4_do_rcv+0x166/0x323
> [ 4437.367226]  [<ffffffff812cfdce>] ? tcp_v4_rcv+0x404/0x65d
> [ 4437.367230]  [<ffffffff812b4d55>] ? ip_local_deliver_finish+0x148/0x1ba
> [ 4437.367233]  [<ffffffff8128cfa4>] ? __netif_receive_skb+0x3f2/0x43f
> [ 4437.367236]  [<ffffffff8128d31d>] ? netif_receive_skb+0x7e/0x84
> [ 4437.367239]  [<ffffffff8128d7dd>] ? napi_gro_receive+0x1c/0x29
> [ 4437.367241]  [<ffffffff8128d398>] ? napi_skb_finish+0x1c/0x31
> [ 4437.367253]  [<ffffffffa026bde3>] ? e1000_clean_rx_irq+0x1f3/0x290 [e1000e]
> [ 4437.367261]  [<ffffffffa026c26c>] ? e1000_clean+0x69/0x208 [e1000e]
> [ 4437.367264]  [<ffffffff8128d8fb>] ? net_rx_action+0xa4/0x1c0
> [ 4437.367268]  [<ffffffff8104c581>] ? __do_softirq+0xc0/0x188
> [ 4437.367272]  [<ffffffff81351fac>] ? call_softirq+0x1c/0x30
> [ 4437.367276]  [<ffffffff8100f98d>] ? do_softirq+0x3c/0x7b
> [ 4437.367278]  [<ffffffff8104c87c>] ? irq_exit+0x3d/0xa7
> [ 4437.367281]  [<ffffffff8100f6b4>] ? do_IRQ+0x81/0x97
> [ 4437.367285]  [<ffffffff8134ba2e>] ? common_interrupt+0x6e/0x6e
> [ 4437.367287]  <EOI>  [<ffffffffa008b32c>] ? dec128+0x434/0x80c [aes_x86_64]
> [ 4437.367307]  [<ffffffffa0085164>] ? crypt+0xae/0x101 [xts]
> [ 4437.367313]  [<ffffffffa008b712>] ? aes_decrypt+0xe/0xe [aes_x86_64]
> [ 4437.367320]  [<ffffffffa008b704>] ? dec128+0x80c/0x80c [aes_x86_64]
> [ 4437.367327]  [<ffffffffa00851f6>] ? decrypt+0x3f/0x44 [xts]
> [ 4437.367331]  [<ffffffff8118cdb3>] ? async_decrypt+0x37/0x3c
> [ 4437.367338]  [<ffffffffa0105e2a>] ? crypt_convert+0x22f/0x2c4 [dm_crypt]
> [ 4437.367342]  [<ffffffff8100d02f>] ? load_TLS+0x7/0xa
> [ 4437.367348]  [<ffffffffa01061b8>] ? kcryptd_crypt+0x56/0x342 [dm_crypt]
> [ 4437.367352]  [<ffffffff81038cd2>] ? finish_task_switch+0x86/0xb7
> [ 4437.367355]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
> [ 4437.367358]  [<ffffffff8134e1d2>] ? sub_preempt_count+0x83/0x94
> [ 4437.367361]  [<ffffffff8103612b>] ? need_resched+0x1a/0x23
> [ 4437.367368]  [<ffffffffa0106162>] ? crypt_convert_init.isra.14+0x4f/0x4f [dm_crypt]
> [ 4437.367372]  [<ffffffff8105b867>] ? process_one_work+0x16d/0x298
> [ 4437.367375]  [<ffffffff8105c84a>] ? worker_thread+0xc2/0x145
> [ 4437.367378]  [<ffffffff8105c788>] ? manage_workers.isra.23+0x15b/0x15b
> [ 4437.367381]  [<ffffffff8105f9fe>] ? kthread+0x76/0x7e
> [ 4437.367384]  [<ffffffff81351eb4>] ? kernel_thread_helper+0x4/0x10
> [ 4437.367387]  [<ffffffff8105f988>] ? kthread_worker_fn+0x139/0x139
> [ 4437.367390]  [<ffffffff81351eb0>] ? gs_change+0x13/0x13
> [ 4437.367392] Mem-Info:
> [ 4437.367393] Node 0 DMA per-cpu:
> [ 4437.367396] CPU    0: hi:    0, btch:   1 usd:   0
> [ 4437.367397] CPU    1: hi:    0, btch:   1 usd:   0
> [ 4437.367399] Node 0 DMA32 per-cpu:
> [ 4437.367401] CPU    0: hi:  186, btch:  31 usd: 164
> [ 4437.367403] CPU    1: hi:  186, btch:  31 usd: 111
> [ 4437.367405] Node 0 Normal per-cpu:
> [ 4437.367407] CPU    0: hi:  186, btch:  31 usd: 114
> [ 4437.367409] CPU    1: hi:  186, btch:  31 usd: 158
> [ 4437.367413] active_anon:391300 inactive_anon:132951 isolated_anon:0
> [ 4437.367414]  active_file:136666 inactive_file:140710 isolated_file:31
> [ 4437.367415]  unevictable:1 dirty:3402 writeback:26688 unstable:7844
> [ 4437.367416]  free:36509 slab_reclaimable:85289 slab_unreclaimable:35524
> [ 4437.367417]  mapped:18088 shmem:35934 pagetables:9300 bounce:0
> [ 4437.367419] Node 0 DMA free:15712kB min:260kB low:324kB high:388kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:36kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:36kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:160kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:40833 all_unreclaimable? yes
> [ 4437.367428] lowmem_reserve[]: 0 2960 3907 3907
> [ 4437.367432] Node 0 DMA32 free:110732kB min:51004kB low:63752kB high:76504kB active_anon:1380396kB inactive_anon:345140kB active_file:422008kB inactive_file:437440kB unevictable:4kB isolated(anon):0kB isolated(file):124kB present:3031688kB mlocked:4kB dirty:7148kB writeback:72004kB mapped:39424kB shmem:64836kB slab_reclaimable:212408kB slab_unreclaimable:80516kB kernel_stack:1720kB pagetables:19252kB unstable:23964kB bounce:0kB writeback_tmp:0kB pages_scanned:63 all_unreclaimable? no
> [ 4437.367442] lowmem_reserve[]: 0 0 946 946
> [ 4437.367445] Node 0 Normal free:19592kB min:16312kB low:20388kB high:24468kB active_anon:184804kB inactive_anon:186664kB active_file:124656kB inactive_file:125364kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:969600kB mlocked:0kB dirty:6460kB writeback:34712kB mapped:32928kB shmem:78900kB slab_reclaimable:128748kB slab_unreclaimable:61420kB kernel_stack:2792kB pagetables:17948kB unstable:7412kB bounce:0kB writeback_tmp:0kB pages_scanned:89 all_unreclaimable? no
> [ 4437.367455] lowmem_reserve[]: 0 0 0 0
> [ 4437.367458] Node 0 DMA: 2*4kB 1*8kB 1*16kB 0*32kB 1*64kB 2*128kB 2*256kB 1*512kB 2*1024kB 2*2048kB 2*4096kB = 15712kB
> [ 4437.367467] Node 0 DMA32: 25961*4kB 73*8kB 8*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 110732kB
> [ 4437.367475] Node 0 Normal: 4134*4kB 0*8kB 1*16kB 1*32kB 0*64kB 2*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 19656kB
> [ 4437.367484] 317456 total pagecache pages
> [ 4437.367485] 4042 pages in swap cache
> [ 4437.367487] Swap cache stats: add 31786, delete 27744, find 10282/11070
> [ 4437.367489] Free swap  = 4012560kB
> [ 4437.367490] Total swap = 4106248kB
> [ 4437.370978] 1032176 pages RAM
> [ 4437.370978] 42834 pages reserved
> [ 4437.370978] 390787 pages shared
> [ 4437.370978] 750687 pages non-shared
> 
> 
> [ 8640.516177] INFO: task flush-0:37:7122 blocked for more than 120 seconds.
> [ 8640.516182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 8640.516186] flush-0:37      D ffff88013bc93580     0  7122      2 0x00000080
> [ 8640.516192]  ffff880072c28810 0000000000000046 ffff880100000000 ffff880134f2e180
> [ 8640.516199]  0000000000013580 ffff88006d491fd8 ffff88006d491fd8 ffff880072c28810
> [ 8640.516205]  ffff88013bfd1c50 000000018134b58b ffff88010c3cc1b0 ffff88006d491d18
> [ 8640.516211] Call Trace:
> [ 8640.516221]  [<ffffffff8110e81a>] ? inode_owner_or_capable+0x36/0x36
> [ 8640.516226]  [<ffffffff8110e820>] ? inode_wait+0x6/0xa
> [ 8640.516232]  [<ffffffff8134a72c>] ? __wait_on_bit+0x3e/0x71
> [ 8640.516241]  [<ffffffff8103e0fa>] ? get_parent_ip+0x9/0x1b
> [ 8640.516245]  [<ffffffff81119674>] ? inode_wait_for_writeback+0xa2/0xc8
> [ 8640.516249]  [<ffffffff810600c9>] ? autoremove_wake_function+0x2a/0x2a
> [ 8640.516252]  [<ffffffff8111b4b4>] ? wb_writeback+0x226/0x255
> [ 8640.516255]  [<ffffffff8134e27d>] ? add_preempt_count+0x9a/0x9c
> [ 8640.516258]  [<ffffffff8111b8d4>] ? wb_do_writeback+0x150/0x1b2
> [ 8640.516261]  [<ffffffff8111b9c5>] ? bdi_writeback_thread+0x8f/0x204
> [ 8640.516264]  [<ffffffff8111b936>] ? wb_do_writeback+0x1b2/0x1b2
> [ 8640.516266]  [<ffffffff8105f9fe>] ? kthread+0x76/0x7e
> [ 8640.516270]  [<ffffffff81351eb4>] ? kernel_thread_helper+0x4/0x10
> [ 8640.516273]  [<ffffffff8105f988>] ? kthread_worker_fn+0x139/0x139
> [ 8640.516275]  [<ffffffff81351eb0>] ? gs_change+0x13/0x13
> [ 8640.516281] INFO: task cp:7568 blocked for more than 120 seconds.
> [ 8640.516283] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 8640.516284] cp              D ffff88013bc13580     0  7568   6744 0x00000080
> [ 8640.516288]  ffff880123976750 0000000000000082 0000000000000000 ffffffff8160d020
> [ 8640.516292]  0000000000013580 ffff88001b3a9fd8 ffff88001b3a9fd8 ffff880123976750
> [ 8640.516295]  0000000000000001 0000000181066767 ffff880131463e50 ffff88013bc13e08
> [ 8640.516299] Call Trace:
> [ 8640.516303]  [<ffffffff810b5d03>] ? __lock_page+0x66/0x66
> [ 8640.516306]  [<ffffffff8134a2ec>] ? io_schedule+0x58/0x6f
> [ 8640.516308]  [<ffffffff810b5d09>] ? sleep_on_page+0x6/0xa
> [ 8640.516311]  [<ffffffff8134a72c>] ? __wait_on_bit+0x3e/0x71
> [ 8640.516313]  [<ffffffff810b5e51>] ? wait_on_page_bit+0x6e/0x73
> [ 8640.516316]  [<ffffffff810600c9>] ? autoremove_wake_function+0x2a/0x2a
> [ 8640.516319]  [<ffffffff810b5f29>] ? filemap_fdatawait_range+0x74/0x139
> [ 8640.516327]  [<ffffffff8111acab>] ? writeback_single_inode+0x155/0x2f4
> [ 8640.516330]  [<ffffffff8111ae94>] ? sync_inode+0x4a/0x6f
> [ 8640.516343]  [<ffffffffa06b9b02>] ? nfs_wb_all+0x39/0x3e [nfs]
> [ 8640.516351]  [<ffffffffa06aeed1>] ? nfs_setattr+0x8e/0xf6 [nfs]
> [ 8640.516354]  [<ffffffff811104c3>] ? notify_change+0x177/0x24f
> [ 8640.516357]  [<ffffffff8111e85c>] ? utimes_common+0x10c/0x135
> [ 8640.516361]  [<ffffffff810fd55a>] ? fget+0x50/0x57
> [ 8640.516364]  [<ffffffff8111e90f>] ? do_utimes+0x8a/0xd6
> [ 8640.516367]  [<ffffffff810fc7a2>] ? vfs_read+0x9f/0xe6
> [ 8640.516369]  [<ffffffff8111ea24>] ? sys_utimensat+0x64/0x6b
> [ 8640.516372]  [<ffffffff8134fd52>] ? system_call_fastpath+0x16/0x1b
> 
> 
> [ 9654.042164] ieee80211 phy0: failed to reallocate TX buffer
> [ 9654.042189] ieee80211 phy0: failed to reallocate TX buffer
> (120,000 lines of this)

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-04-10  5:11                   ` Marc MERLIN
@ 2012-04-10  6:11                     ` Eric Dumazet
  2012-04-11  5:27                       ` Marc MERLIN
  0 siblings, 1 reply; 72+ messages in thread
From: Eric Dumazet @ 2012-04-10  6:11 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

On Mon, 2012-04-09 at 22:11 -0700, Marc MERLIN wrote:
> On Tue, Apr 10, 2012 at 05:56:20AM +0200, Eric Dumazet wrote:
> > > What wireless device are we dealing with again?
> > 
> > Problem seems related to tailroom needed by mac80211
> > (IEEE80211_ENCRYPT_TAILROOM = 18 bytes)
> > 
> > So we must reallocate skb->head, thats impressive nobody cares.
> > 
> > [ 3007.249687] ieee80211_skb_resize(skb=ffff8802329846e8) cloned=1 head_need=0 tail_need=18 skb->len=1494 ksize=4096 tailroom=0 headroom=2282
> > [ 3007.249693] ieee80211_skb_resize(skb=ffff8802329846e8) cloned=0 head_need=0 tail_need=0 skb->len=1526 ksize=8192 tailroom=64 headroom=2250
> > 
> > Ouch... skb_tailroom() seems wrong ... it seems pskb_expand_head() is really suboptimal.
> > 
> > It appears tcp_sendmsg() tries to fill skb completely, with no available tailroom :
> > 
> >                         if (skb_tailroom(skb) > 0) {
> >                                 /* We have some space in skb head. Superb! */
> >                                 if (copy > skb_tailroom(skb))
> >                                         copy = skb_tailroom(skb);
> >                                 err = skb_add_data_nocache(sk, skb, from, copy);
> >                                 if (err)
> >                                         goto do_fault;
> >                         } else {
> > 
> > Shouldnt we take into account dev->needed_tailroom ?
> > 
> > I'll submit a pskb_expand_head() fix asap.
> 
> Thanks for finding this.
> 
> To answer an earlier question, I tried the non wireless case too.
> 
> The problem is harder to reproduce over e1000e though, I just got two short
> hangs where my mouse cursor was hung for 5-10 seconds, but nothing in
> syslog/dmesg this time.
> 
> I'm pretty sure this older log below did happen on e1000e with wireless disabled
> though (but it had a taint 'O'):
> 
> If that helps, my earlier message had the traces below.
> 
> I can report back when you have a patch you'd like me to try out.

Hi Marc

Please try following patch, as it solved the problem for me (no more
order-1 allocations in tx path)

Thanks !

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 3337027..70a3f8d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -481,6 +481,7 @@ struct sk_buff {
 	union {
 		__u32		mark;
 		__u32		dropcount;
+		__u32		avail_size;
 	};
 
 	sk_buff_data_t		transport_header;
@@ -1366,6 +1367,18 @@ static inline int skb_tailroom(const struct sk_buff *skb)
 }
 
 /**
+ *	skb_availroom - bytes at buffer end
+ *	@skb: buffer to check
+ *
+ *	Return the number of bytes of free space at the tail of an sk_buff
+ *	allocated by sk_stream_alloc()
+ */
+static inline int skb_availroom(const struct sk_buff *skb)
+{
+	return skb_is_nonlinear(skb) ? 0 : skb->avail_size - skb->len;
+}
+
+/**
  *	skb_reserve - adjust headroom
  *	@skb: buffer to alter
  *	@len: bytes to move
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index baf8d28..1887454 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -952,9 +952,11 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 		goto adjust_others;
 	}
 
-	data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
+	data = kmalloc(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
+		       gfp_mask);
 	if (!data)
 		goto nodata;
+	size = ksize(data) - SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
 	/* Copy only real data... and, alas, header. This should be
 	 * optimized for the cases when header is void.
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5d54ed3..87f497f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -701,11 +701,12 @@ struct sk_buff *sk_stream_alloc_skb(struct sock *sk, int size, gfp_t gfp)
 	skb = alloc_skb_fclone(size + sk->sk_prot->max_header, gfp);
 	if (skb) {
 		if (sk_wmem_schedule(sk, skb->truesize)) {
+			skb_reserve(skb, sk->sk_prot->max_header);
 			/*
 			 * Make sure that we have exactly size bytes
 			 * available to the caller, no more, no less.
 			 */
-			skb_reserve(skb, skb_tailroom(skb) - size);
+			skb->avail_size = size;		
 			return skb;
 		}
 		__kfree_skb(skb);
@@ -995,10 +996,9 @@ new_segment:
 				copy = seglen;
 
 			/* Where to copy to? */
-			if (skb_tailroom(skb) > 0) {
+			if (skb_availroom(skb) > 0) {
 				/* We have some space in skb head. Superb! */
-				if (copy > skb_tailroom(skb))
-					copy = skb_tailroom(skb);
+				copy = min_t(int, copy, skb_availroom(skb));
 				err = skb_add_data_nocache(sk, skb, from, copy);
 				if (err)
 					goto do_fault;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 364784a..376b2cf 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2060,7 +2060,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
 		/* Punt if not enough space exists in the first SKB for
 		 * the data in the second
 		 */
-		if (skb->len > skb_tailroom(to))
+		if (skb->len > skb_availroom(to))
 			break;
 
 		if (after(TCP_SKB_CB(skb)->end_seq, tcp_wnd_end(tp)))



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
  2012-04-10  6:11                     ` Eric Dumazet
@ 2012-04-11  5:27                       ` Marc MERLIN
  2012-04-11  5:43                           ` Eric Dumazet
                                           ` (2 more replies)
  0 siblings, 3 replies; 72+ messages in thread
From: Marc MERLIN @ 2012-04-11  5:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

On Tue, Apr 10, 2012 at 08:11:03AM +0200, Eric Dumazet wrote:
> Please try following patch, as it solved the problem for me (no more
> order-1 allocations in tx path)

I applied our patch to 3.3.1 and cannot reproduce the problem anymore.

I'll leave a big wireless copy running overnight just in case, but I think
you fixed it.

Thanks much,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
@ 2012-04-11  5:43                           ` Eric Dumazet
  0 siblings, 0 replies; 72+ messages in thread
From: Eric Dumazet @ 2012-04-11  5:43 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

On Tue, 2012-04-10 at 22:27 -0700, Marc MERLIN wrote:
> On Tue, Apr 10, 2012 at 08:11:03AM +0200, Eric Dumazet wrote:
> > Please try following patch, as it solved the problem for me (no more
> > order-1 allocations in tx path)
> 
> I applied our patch to 3.3.1 and cannot reproduce the problem anymore.
> 
> I'll leave a big wireless copy running overnight just in case, but I think
> you fixed it.
> 
> Thanks much,
> Marc

Thanks Marc for bringing this issue.

I have a lenovo T420s laptop and could debug the thing pretty fast.

I'll send two official patches.



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure)
@ 2012-04-11  5:43                           ` Eric Dumazet
  0 siblings, 0 replies; 72+ messages in thread
From: Eric Dumazet @ 2012-04-11  5:43 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: David Miller, Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ,
	bhutchings-s/n/eUQHGBpZroRs9YW3xA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

On Tue, 2012-04-10 at 22:27 -0700, Marc MERLIN wrote:
> On Tue, Apr 10, 2012 at 08:11:03AM +0200, Eric Dumazet wrote:
> > Please try following patch, as it solved the problem for me (no more
> > order-1 allocations in tx path)
> 
> I applied our patch to 3.3.1 and cannot reproduce the problem anymore.
> 
> I'll leave a big wireless copy running overnight just in case, but I think
> you fixed it.
> 
> Thanks much,
> Marc

Thanks Marc for bringing this issue.

I have a lenovo T420s laptop and could debug the thing pretty fast.

I'll send two official patches.


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH] net: allow pskb_expand_head() to get maximum tailroom
  2012-04-11  5:27                       ` Marc MERLIN
  2012-04-11  5:43                           ` Eric Dumazet
@ 2012-04-11  6:08                         ` Eric Dumazet
  2012-04-11 14:11                             ` David Miller
  2012-07-15 21:59                         ` 3.4.4/amd64 full interrupt hangs under big nfs copies Marc MERLIN
  2 siblings, 1 reply; 72+ messages in thread
From: Eric Dumazet @ 2012-04-11  6:08 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

Marc Merlin reported many order-1 allocations failures in TX path on its
wireless setup, that dont make any sense with MTU=1500 network, and non
SG capable hardware.

Turns out part of the problem comes from pskb_expand_head() not using
ksize() to get exact head size given by kmalloc(). Doing the same thing
than __alloc_skb() allows more tailroom in skb and can prevent future
reallocations.

As a bonus, struct skb_shared_info becomes cache line aligned.

Reported-by: Marc MERLIN <marc@merlins.org>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/skbuff.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index baf8d28..e598400 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -952,9 +952,11 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 		goto adjust_others;
 	}
 
-	data = kmalloc(size + sizeof(struct skb_shared_info), gfp_mask);
+	data = kmalloc(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
+		       gfp_mask);
 	if (!data)
 		goto nodata;
+	size = SKB_WITH_OVERHEAD(ksize(data));
 
 	/* Copy only real data... and, alas, header. This should be
 	 * optimized for the cases when header is void.



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH] tcp: avoid order-1 allocations on wifi and tx path
  2012-04-11  5:43                           ` Eric Dumazet
@ 2012-04-11  6:30                             ` Eric Dumazet
  -1 siblings, 0 replies; 72+ messages in thread
From: Eric Dumazet @ 2012-04-11  6:30 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

Marc Merlin reported many order-1 allocations failures in TX path on its
wireless setup, that dont make any sense with MTU=1500 network, and non
SG capable hardware.

After investigation, it turns out TCP uses sk_stream_alloc_skb() and
used as a convention skb_tailroom(skb) to know how many bytes of data
payload could be put in this skb (for non SG capable devices)

Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER +
sizeof(struct skb_shared_info) being above 2048)

Later, mac80211 layer need to add some bytes at the tail of skb
(IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is
available has to call pskb_expand_head() and request order-1
allocations.

This patch changes sk_stream_alloc_skb() so that only
sk->sk_prot->max_header bytes of headroom are reserved, and use a new
skb field, avail_size to hold the data payload limit.

This way, order-0 allocations done by TCP stack can leave more than 2 KB
of tailroom and no more allocation is performed in mac80211 layer (or
any layer needing some tailroom)

avail_size is unioned with mark/dropcount, since mark will be set later
in IP stack for output packets. Therefore, skb size is unchanged.

Reported-by: Marc MERLIN <marc@merlins.org>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/skbuff.h |   13 +++++++++++++
 net/ipv4/tcp.c         |    8 ++++----
 net/ipv4/tcp_output.c  |    2 +-
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 3337027..70a3f8d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -481,6 +481,7 @@ struct sk_buff {
 	union {
 		__u32		mark;
 		__u32		dropcount;
+		__u32		avail_size;
 	};
 
 	sk_buff_data_t		transport_header;
@@ -1366,6 +1367,18 @@ static inline int skb_tailroom(const struct sk_buff *skb)
 }
 
 /**
+ *	skb_availroom - bytes at buffer end
+ *	@skb: buffer to check
+ *
+ *	Return the number of bytes of free space at the tail of an sk_buff
+ *	allocated by sk_stream_alloc()
+ */
+static inline int skb_availroom(const struct sk_buff *skb)
+{
+	return skb_is_nonlinear(skb) ? 0 : skb->avail_size - skb->len;
+}
+
+/**
  *	skb_reserve - adjust headroom
  *	@skb: buffer to alter
  *	@len: bytes to move
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5d54ed3..87f497f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -701,11 +701,12 @@ struct sk_buff *sk_stream_alloc_skb(struct sock *sk, int size, gfp_t gfp)
 	skb = alloc_skb_fclone(size + sk->sk_prot->max_header, gfp);
 	if (skb) {
 		if (sk_wmem_schedule(sk, skb->truesize)) {
+			skb_reserve(skb, sk->sk_prot->max_header);
 			/*
 			 * Make sure that we have exactly size bytes
 			 * available to the caller, no more, no less.
 			 */
-			skb_reserve(skb, skb_tailroom(skb) - size);
+			skb->avail_size = size;		
 			return skb;
 		}
 		__kfree_skb(skb);
@@ -995,10 +996,9 @@ new_segment:
 				copy = seglen;
 
 			/* Where to copy to? */
-			if (skb_tailroom(skb) > 0) {
+			if (skb_availroom(skb) > 0) {
 				/* We have some space in skb head. Superb! */
-				if (copy > skb_tailroom(skb))
-					copy = skb_tailroom(skb);
+				copy = min_t(int, copy, skb_availroom(skb));
 				err = skb_add_data_nocache(sk, skb, from, copy);
 				if (err)
 					goto do_fault;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 364784a..376b2cf 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2060,7 +2060,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
 		/* Punt if not enough space exists in the first SKB for
 		 * the data in the second
 		 */
-		if (skb->len > skb_tailroom(to))
+		if (skb->len > skb_availroom(to))
 			break;
 
 		if (after(TCP_SKB_CB(skb)->end_seq, tcp_wnd_end(tp)))



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH] tcp: avoid order-1 allocations on wifi and tx path
@ 2012-04-11  6:30                             ` Eric Dumazet
  0 siblings, 0 replies; 72+ messages in thread
From: Eric Dumazet @ 2012-04-11  6:30 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: David Miller, Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ,
	bhutchings-s/n/eUQHGBpZroRs9YW3xA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

Marc Merlin reported many order-1 allocations failures in TX path on its
wireless setup, that dont make any sense with MTU=1500 network, and non
SG capable hardware.

After investigation, it turns out TCP uses sk_stream_alloc_skb() and
used as a convention skb_tailroom(skb) to know how many bytes of data
payload could be put in this skb (for non SG capable devices)

Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER +
sizeof(struct skb_shared_info) being above 2048)

Later, mac80211 layer need to add some bytes at the tail of skb
(IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is
available has to call pskb_expand_head() and request order-1
allocations.

This patch changes sk_stream_alloc_skb() so that only
sk->sk_prot->max_header bytes of headroom are reserved, and use a new
skb field, avail_size to hold the data payload limit.

This way, order-0 allocations done by TCP stack can leave more than 2 KB
of tailroom and no more allocation is performed in mac80211 layer (or
any layer needing some tailroom)

avail_size is unioned with mark/dropcount, since mark will be set later
in IP stack for output packets. Therefore, skb size is unchanged.

Reported-by: Marc MERLIN <marc-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
Tested-by: Marc MERLIN <marc-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
Signed-off-by: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 include/linux/skbuff.h |   13 +++++++++++++
 net/ipv4/tcp.c         |    8 ++++----
 net/ipv4/tcp_output.c  |    2 +-
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 3337027..70a3f8d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -481,6 +481,7 @@ struct sk_buff {
 	union {
 		__u32		mark;
 		__u32		dropcount;
+		__u32		avail_size;
 	};
 
 	sk_buff_data_t		transport_header;
@@ -1366,6 +1367,18 @@ static inline int skb_tailroom(const struct sk_buff *skb)
 }
 
 /**
+ *	skb_availroom - bytes at buffer end
+ *	@skb: buffer to check
+ *
+ *	Return the number of bytes of free space at the tail of an sk_buff
+ *	allocated by sk_stream_alloc()
+ */
+static inline int skb_availroom(const struct sk_buff *skb)
+{
+	return skb_is_nonlinear(skb) ? 0 : skb->avail_size - skb->len;
+}
+
+/**
  *	skb_reserve - adjust headroom
  *	@skb: buffer to alter
  *	@len: bytes to move
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5d54ed3..87f497f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -701,11 +701,12 @@ struct sk_buff *sk_stream_alloc_skb(struct sock *sk, int size, gfp_t gfp)
 	skb = alloc_skb_fclone(size + sk->sk_prot->max_header, gfp);
 	if (skb) {
 		if (sk_wmem_schedule(sk, skb->truesize)) {
+			skb_reserve(skb, sk->sk_prot->max_header);
 			/*
 			 * Make sure that we have exactly size bytes
 			 * available to the caller, no more, no less.
 			 */
-			skb_reserve(skb, skb_tailroom(skb) - size);
+			skb->avail_size = size;		
 			return skb;
 		}
 		__kfree_skb(skb);
@@ -995,10 +996,9 @@ new_segment:
 				copy = seglen;
 
 			/* Where to copy to? */
-			if (skb_tailroom(skb) > 0) {
+			if (skb_availroom(skb) > 0) {
 				/* We have some space in skb head. Superb! */
-				if (copy > skb_tailroom(skb))
-					copy = skb_tailroom(skb);
+				copy = min_t(int, copy, skb_availroom(skb));
 				err = skb_add_data_nocache(sk, skb, from, copy);
 				if (err)
 					goto do_fault;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 364784a..376b2cf 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2060,7 +2060,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
 		/* Punt if not enough space exists in the first SKB for
 		 * the data in the second
 		 */
-		if (skb->len > skb_tailroom(to))
+		if (skb->len > skb_availroom(to))
 			break;
 
 		if (after(TCP_SKB_CB(skb)->end_seq, tcp_wnd_end(tp)))


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH] tcp: avoid order-1 allocations on wifi and tx path
  2012-04-11  6:30                             ` Eric Dumazet
@ 2012-04-11  7:38                               ` Eric Dumazet
  -1 siblings, 0 replies; 72+ messages in thread
From: Eric Dumazet @ 2012-04-11  7:38 UTC (permalink / raw)
  To: Marc MERLIN, David Miller
  Cc: Larry.Finger, bhutchings, linux-wireless, netdev

On Wed, 2012-04-11 at 08:30 +0200, Eric Dumazet wrote:
> Marc Merlin reported many order-1 allocations failures in TX path on its
> wireless setup, that dont make any sense with MTU=1500 network, and non
> SG capable hardware.
> 
> After investigation, it turns out TCP uses sk_stream_alloc_skb() and
> used as a convention skb_tailroom(skb) to know how many bytes of data
> payload could be put in this skb (for non SG capable devices)
> 
...
> 
> Reported-by: Marc MERLIN <marc@merlins.org>
> Tested-by: Marc MERLIN <marc@merlins.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

David, I forgot to say this should be backported to 3.2 & 3.3

commit 87fb4b7b533073 (net: more accurate skb truesize) did the
placement of skb_shared_info at the end of skb head, so
sk_stream_alloc_skb() had to reserve more room so that tailroom stayed
at MSS




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] tcp: avoid order-1 allocations on wifi and tx path
@ 2012-04-11  7:38                               ` Eric Dumazet
  0 siblings, 0 replies; 72+ messages in thread
From: Eric Dumazet @ 2012-04-11  7:38 UTC (permalink / raw)
  To: Marc MERLIN, David Miller
  Cc: Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ,
	bhutchings-s/n/eUQHGBpZroRs9YW3xA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

On Wed, 2012-04-11 at 08:30 +0200, Eric Dumazet wrote:
> Marc Merlin reported many order-1 allocations failures in TX path on its
> wireless setup, that dont make any sense with MTU=1500 network, and non
> SG capable hardware.
> 
> After investigation, it turns out TCP uses sk_stream_alloc_skb() and
> used as a convention skb_tailroom(skb) to know how many bytes of data
> payload could be put in this skb (for non SG capable devices)
> 
...
> 
> Reported-by: Marc MERLIN <marc-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
> Tested-by: Marc MERLIN <marc-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
> Signed-off-by: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

David, I forgot to say this should be backported to 3.2 & 3.3

commit 87fb4b7b533073 (net: more accurate skb truesize) did the
placement of skb_shared_info at the end of skb head, so
sk_stream_alloc_skb() had to reserve more room so that tailroom stayed
at MSS



--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] net: allow pskb_expand_head() to get maximum tailroom
  2012-04-11  6:08                         ` [PATCH] net: allow pskb_expand_head() to get maximum tailroom Eric Dumazet
@ 2012-04-11 14:11                             ` David Miller
  0 siblings, 0 replies; 72+ messages in thread
From: David Miller @ 2012-04-11 14:11 UTC (permalink / raw)
  To: eric.dumazet; +Cc: marc, Larry.Finger, bhutchings, linux-wireless, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 11 Apr 2012 08:08:39 +0200

> Marc Merlin reported many order-1 allocations failures in TX path on its
> wireless setup, that dont make any sense with MTU=1500 network, and non
> SG capable hardware.
> 
> Turns out part of the problem comes from pskb_expand_head() not using
> ksize() to get exact head size given by kmalloc(). Doing the same thing
> than __alloc_skb() allows more tailroom in skb and can prevent future
> reallocations.
> 
> As a bonus, struct skb_shared_info becomes cache line aligned.
> 
> Reported-by: Marc MERLIN <marc@merlins.org>
> Tested-by: Marc MERLIN <marc@merlins.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] net: allow pskb_expand_head() to get maximum tailroom
@ 2012-04-11 14:11                             ` David Miller
  0 siblings, 0 replies; 72+ messages in thread
From: David Miller @ 2012-04-11 14:11 UTC (permalink / raw)
  To: eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w
  Cc: marc-xnduUnryOU1AfugRpC6u6w, Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ,
	bhutchings-s/n/eUQHGBpZroRs9YW3xA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

From: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Wed, 11 Apr 2012 08:08:39 +0200

> Marc Merlin reported many order-1 allocations failures in TX path on its
> wireless setup, that dont make any sense with MTU=1500 network, and non
> SG capable hardware.
> 
> Turns out part of the problem comes from pskb_expand_head() not using
> ksize() to get exact head size given by kmalloc(). Doing the same thing
> than __alloc_skb() allows more tailroom in skb and can prevent future
> reallocations.
> 
> As a bonus, struct skb_shared_info becomes cache line aligned.
> 
> Reported-by: Marc MERLIN <marc-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
> Tested-by: Marc MERLIN <marc-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
> Signed-off-by: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Applied and queued up for -stable.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] tcp: avoid order-1 allocations on wifi and tx path
  2012-04-11  6:30                             ` Eric Dumazet
  (?)
  (?)
@ 2012-04-11 14:11                             ` David Miller
  -1 siblings, 0 replies; 72+ messages in thread
From: David Miller @ 2012-04-11 14:11 UTC (permalink / raw)
  To: eric.dumazet; +Cc: marc, Larry.Finger, bhutchings, linux-wireless, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 11 Apr 2012 08:30:48 +0200

> Marc Merlin reported many order-1 allocations failures in TX path on its
> wireless setup, that dont make any sense with MTU=1500 network, and non
> SG capable hardware.
> 
> After investigation, it turns out TCP uses sk_stream_alloc_skb() and
> used as a convention skb_tailroom(skb) to know how many bytes of data
> payload could be put in this skb (for non SG capable devices)
> 
> Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER +
> sizeof(struct skb_shared_info) being above 2048)
> 
> Later, mac80211 layer need to add some bytes at the tail of skb
> (IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is
> available has to call pskb_expand_head() and request order-1
> allocations.
> 
> This patch changes sk_stream_alloc_skb() so that only
> sk->sk_prot->max_header bytes of headroom are reserved, and use a new
> skb field, avail_size to hold the data payload limit.
> 
> This way, order-0 allocations done by TCP stack can leave more than 2 KB
> of tailroom and no more allocation is performed in mac80211 layer (or
> any layer needing some tailroom)
> 
> avail_size is unioned with mark/dropcount, since mark will be set later
> in IP stack for output packets. Therefore, skb size is unchanged.
> 
> Reported-by: Marc MERLIN <marc@merlins.org>
> Tested-by: Marc MERLIN <marc@merlins.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] tcp: avoid order-1 allocations on wifi and tx path
  2012-04-11  7:38                               ` Eric Dumazet
@ 2012-04-11 14:12                                 ` David Miller
  -1 siblings, 0 replies; 72+ messages in thread
From: David Miller @ 2012-04-11 14:12 UTC (permalink / raw)
  To: eric.dumazet; +Cc: marc, Larry.Finger, bhutchings, linux-wireless, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 11 Apr 2012 09:38:02 +0200

> David, I forgot to say this should be backported to 3.2 & 3.3

Yep.

> commit 87fb4b7b533073 (net: more accurate skb truesize) did the
> placement of skb_shared_info at the end of skb head, so
> sk_stream_alloc_skb() had to reserve more room so that tailroom stayed
> at MSS

Will keep that in mind, thanks.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH] tcp: avoid order-1 allocations on wifi and tx path
@ 2012-04-11 14:12                                 ` David Miller
  0 siblings, 0 replies; 72+ messages in thread
From: David Miller @ 2012-04-11 14:12 UTC (permalink / raw)
  To: eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w
  Cc: marc-xnduUnryOU1AfugRpC6u6w, Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ,
	bhutchings-s/n/eUQHGBpZroRs9YW3xA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

From: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Wed, 11 Apr 2012 09:38:02 +0200

> David, I forgot to say this should be backported to 3.2 & 3.3

Yep.

> commit 87fb4b7b533073 (net: more accurate skb truesize) did the
> placement of skb_shared_info at the end of skb head, so
> sk_stream_alloc_skb() had to reserve more room so that tailroom stayed
> at MSS

Will keep that in mind, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.4.4/amd64 full interrupt hangs under big nfs copies
  2012-04-11  5:27                       ` Marc MERLIN
  2012-04-11  5:43                           ` Eric Dumazet
  2012-04-11  6:08                         ` [PATCH] net: allow pskb_expand_head() to get maximum tailroom Eric Dumazet
@ 2012-07-15 21:59                         ` Marc MERLIN
  2012-07-16  6:18                           ` Eric Dumazet
  2 siblings, 1 reply; 72+ messages in thread
From: Marc MERLIN @ 2012-07-15 21:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

On Tue, Apr 10, 2012 at 10:27:33PM -0700, Marc MERLIN wrote:
> On Tue, Apr 10, 2012 at 08:11:03AM +0200, Eric Dumazet wrote:
> > Please try following patch, as it solved the problem for me (no more
> > order-1 allocations in tx path)
> 
> I applied our patch to 3.3.1 and cannot reproduce the problem anymore.
> 
> I'll leave a big wireless copy running overnight just in case, but I think
> you fixed it.

Mmmh, so I'm running 3.4.4 and I had another full machine hang while copying
big files (gigabytes) over wireless via NFS.
The laptop self recovered after 5mn or so (mouse cursor would not even
move) and I was able to kill -9 the process (midnight commander).
mc did not actually stop for another 4mn or so (i.e. it took that long for
the process to come out of kernel hung state), but the machine was usable
during that time.
Note that copying the same data with scp works fine.
NFS mount looks like this:
gargamel:/mnt/dshelf2/ /net/gargamel/mnt/dshelf2 nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.205.7,local_lock=none,addr=192.168.205.3 0 0

I didn't have anything like last time in the kernel logs, and more
annoyingly, ps -elf does not show anything for any process in WCHAN,
making pointing the finger a bit harder (procps-ng 3.3.3 does not show
anything other than '-' in WCHAN for any process with 3.4.4).

My understanding is that user space calling drivers that shut off all
interrupts for extended periods of time (as least I think so since my mouse
cursor would not move), is still a kernel bug.

For what it's worth, copying 1GB of data in lots of small files does not
cause problems, it seems that it's big files that cause a problem since they
likely fill a buffer somewhere while interrupts are disabled.

Do you have an idea of how I can find out where my mc process is stuck in
the kernel?
Should I reproduce with specific sysrq output?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.4.4/amd64 full interrupt hangs under big nfs copies
  2012-07-15 21:59                         ` 3.4.4/amd64 full interrupt hangs under big nfs copies Marc MERLIN
@ 2012-07-16  6:18                           ` Eric Dumazet
  2012-07-16 15:18                             ` Marc MERLIN
  0 siblings, 1 reply; 72+ messages in thread
From: Eric Dumazet @ 2012-07-16  6:18 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

On Sun, 2012-07-15 at 14:59 -0700, Marc MERLIN wrote:
> On Tue, Apr 10, 2012 at 10:27:33PM -0700, Marc MERLIN wrote:
> > On Tue, Apr 10, 2012 at 08:11:03AM +0200, Eric Dumazet wrote:
> > > Please try following patch, as it solved the problem for me (no more
> > > order-1 allocations in tx path)
> > 
> > I applied our patch to 3.3.1 and cannot reproduce the problem anymore.
> > 
> > I'll leave a big wireless copy running overnight just in case, but I think
> > you fixed it.
> 
> Mmmh, so I'm running 3.4.4 and I had another full machine hang while copying
> big files (gigabytes) over wireless via NFS.
> The laptop self recovered after 5mn or so (mouse cursor would not even
> move) and I was able to kill -9 the process (midnight commander).
> mc did not actually stop for another 4mn or so (i.e. it took that long for
> the process to come out of kernel hung state), but the machine was usable
> during that time.
> Note that copying the same data with scp works fine.
> NFS mount looks like this:
> gargamel:/mnt/dshelf2/ /net/gargamel/mnt/dshelf2 nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.205.7,local_lock=none,addr=192.168.205.3 0 0
> 
> I didn't have anything like last time in the kernel logs, and more
> annoyingly, ps -elf does not show anything for any process in WCHAN,
> making pointing the finger a bit harder (procps-ng 3.3.3 does not show
> anything other than '-' in WCHAN for any process with 3.4.4).
> 
> My understanding is that user space calling drivers that shut off all
> interrupts for extended periods of time (as least I think so since my mouse
> cursor would not move), is still a kernel bug.
> 
> For what it's worth, copying 1GB of data in lots of small files does not
> cause problems, it seems that it's big files that cause a problem since they
> likely fill a buffer somewhere while interrupts are disabled.
> 
> Do you have an idea of how I can find out where my mc process is stuck in
> the kernel?
> Should I reproduce with specific sysrq output?

Just to clarify, you get this freeze when transferring a big file from a
remote NFS server to your PC, (aka a download), not the reverse way ?

If so, you might hit OOM condition because iwlwifi uses big/fat RX
buffers, I never understood why...

(amsdu_size_8K = 1)

Storing an MTU=1500 frams in 8KB of memory sounds really bad.


diff --git a/drivers/net/wireless/iwlwifi/iwl-drv.c b/drivers/net/wireless/iwlwifi/iwl-drv.c
index cc41cfa..434b924 100644
--- a/drivers/net/wireless/iwlwifi/iwl-drv.c
+++ b/drivers/net/wireless/iwlwifi/iwl-drv.c
@@ -1006,7 +1006,7 @@ void iwl_drv_stop(struct iwl_drv *drv)
 
 /* shared module parameters */
 struct iwl_mod_params iwlwifi_mod_params = {
-	.amsdu_size_8K = 1,
+	.amsdu_size_8K = 0,
 	.restart_fw = 1,
 	.plcp_check = true,
 	.bt_coex_active = true,



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: 3.4.4/amd64 full interrupt hangs under big nfs copies
  2012-07-16  6:18                           ` Eric Dumazet
@ 2012-07-16 15:18                             ` Marc MERLIN
  2012-07-16 16:21                               ` Eric Dumazet
  0 siblings, 1 reply; 72+ messages in thread
From: Marc MERLIN @ 2012-07-16 15:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

On Mon, Jul 16, 2012 at 08:18:49AM +0200, Eric Dumazet wrote:
> > My understanding is that user space calling drivers that shut off all
> > interrupts for extended periods of time (as least I think so since my mouse
> > cursor would not move), is still a kernel bug.
> > 
> > For what it's worth, copying 1GB of data in lots of small files does not
> > cause problems, it seems that it's big files that cause a problem since they
> > likely fill a buffer somewhere while interrupts are disabled.
> > 
> > Do you have an idea of how I can find out where my mc process is stuck in
> > the kernel?
> > Should I reproduce with specific sysrq output?
> 
> Just to clarify, you get this freeze when transferring a big file from a
> remote NFS server to your PC, (aka a download), not the reverse way ?
 
No, it's atually when I'm 'uploading' from my laptop to my server.
One interesting thing is that my server is running lvm2 with snapshots,
which makes writes slower than my laptop can push data over the network, so
it's definitely causing buffers to fill up.
I just did a download test and got 4.5MB/s sustained without problems.

> If so, you might hit OOM condition because iwlwifi uses big/fat RX
> buffers, I never understood why...
> 
> (amsdu_size_8K = 1)
> 
> Storing an MTU=1500 frams in 8KB of memory sounds really bad.
> 
> diff --git a/drivers/net/wireless/iwlwifi/iwl-drv.c b/drivers/net/wireless/iwlwifi/iwl-drv.c
> index cc41cfa..434b924 100644
> --- a/drivers/net/wireless/iwlwifi/iwl-drv.c
> +++ b/drivers/net/wireless/iwlwifi/iwl-drv.c
> @@ -1006,7 +1006,7 @@ void iwl_drv_stop(struct iwl_drv *drv)
>  
>  /* shared module parameters */
>  struct iwl_mod_params iwlwifi_mod_params = {
> -	.amsdu_size_8K = 1,
> +	.amsdu_size_8K = 0,

So, do you recomend I try this if my problem is with TX and not RX?

More generally, is there a tip you have for me to get WCHAN in ps on a
process that is causing this problem, or will sysrq-W/sysrq-D show this, or
do I need something like sysrq-T ?

I just reproduced this while having a loop that did 
while :; do date; echo w > /proc/sysrq-trigger; sleep 30; done

I got a nice 30mn (!) hang, including the lines below:
[68151.449220] kworker/0:0: page allocation failure: order:1, mode:0x4020
[68151.449225] Pid: 9004, comm: kworker/0:0 Tainted: G         C O 3.4.4-amd64-preempt-noide-20120410 #1
[68151.449227] Call Trace:
[68151.449228]  <IRQ>  [<ffffffff810bf8d8>] ? warn_alloc_failed+0x11f/0x132
[68151.449239]  [<ffffffff810453ff>] ? __mod_timer+0x13a/0x14c
[68151.449243]  [<ffffffff810c26e2>] ? __alloc_pages_nodemask+0x72f/0x7df
[68151.449256]  [<ffffffff810ecf10>] ? alloc_pages_current+0xc7/0xe4
[68151.449266]  [<ffffffffa04ca247>] ? iwlagn_rx_allocate+0x97/0x24d [iwlwifi]
[68151.449273]  [<ffffffffa04cb27b>] ? iwl_irq_tasklet+0x6e4/0x838 [iwlwifi]
[68151.449279]  [<ffffffff8103f6e9>] ? tasklet_action+0x79/0xc8
[68151.449283]  [<ffffffff8103f205>] ? __do_softirq+0xc0/0x188
[68151.449288]  [<ffffffff8136879c>] ? call_softirq+0x1c/0x30
[68151.449293]  [<ffffffff8100fcf9>] ? do_softirq+0x3c/0x7b
[68151.449297]  [<ffffffff8103f4f3>] ? irq_exit+0x3d/0xa5
[68151.449301]  [<ffffffff8100fa1e>] ? do_IRQ+0x81/0x97
[68151.449306]  [<ffffffff81362e2a>] ? common_interrupt+0x6a/0x6a
[68151.449308]  <EOI>  [<ffffffff811b7bca>] ? idr_get_next+0x7d/0x92
[68151.449318]  [<ffffffff81088d74>] ? css_get_next+0x59/0x97
[68151.449322]  [<ffffffff810fc115>] ? mem_cgroup_iter+0x109/0x1ab
[68151.449328]  [<ffffffff810cb31a>] ? shrink_zone+0x89/0x9b
[68151.449333]  [<ffffffff810cb73c>] ? do_try_to_free_pages+0x1e4/0x434
[68151.449338]  [<ffffffff810cbc11>] ? try_to_free_pages+0xb3/0xf9
[68151.449343]  [<ffffffff810c24a2>] ? __alloc_pages_nodemask+0x4ef/0x7df
[68151.449349]  [<ffffffff810ecf10>] ? alloc_pages_current+0xc7/0xe4
[68151.449356]  [<ffffffffa04ca247>] ? iwlagn_rx_allocate+0x97/0x24d [iwlwifi]
[68151.449363]  [<ffffffffa04ca81e>] ? iwlagn_rx_replenish+0x3a/0x3a [iwlwifi]
[68151.449369]  [<ffffffffa04ca7fc>] ? iwlagn_rx_replenish+0x18/0x3a [iwlwifi]
[68151.449373]  [<ffffffff8104ea7d>] ? process_one_work+0x16d/0x298
[68151.449379]  [<ffffffff8104f4d9>] ? worker_thread+0xc2/0x145
[68151.449383]  [<ffffffff8104f417>] ? manage_workers.isra.23+0x15b/0x15b
[68151.449386]  [<ffffffff81052788>] ? kthread+0x7d/0x85
[68151.449390]  [<ffffffff813686a4>] ? kernel_thread_helper+0x4/0x10
[68151.449395]  [<ffffffff8105270b>] ? kthread_freezable_should_stop+0x37/0x37
[68151.449399]  [<ffffffff813686a0>] ? gs_change+0x13/0x13
(...)
[68151.475337] iwlwifi 0:03:00.0: Failed to alloc_pages with GFP_ATOMIC.Only 5 free buffers remaining.


Full log below around the 30mn hang:
[68145.180343] SysRq : Show Blocked State
[68145.180348]   task                        PC stack   pid father
[68145.180469] mc              D ffff88021e213680     0  6526  21734 0x00000080
[68145.180473]  ffff8802134ff750 0000000000000086 0000000000000008 ffff88020d76a7d0
[68145.180477]  0000000000013680 ffff88001a98dfd8 ffff88001a98dfd8 ffff8802134ff750
[68145.180480]  ffff8801a299ede0 ffff88021e213680 ffff8802134ff750 ffffffff810bb429
[68145.180483] Call Trace:
[68145.180490]  [<ffffffff810bb429>] ? __lock_page+0x66/0x66
[68145.180494]  [<ffffffff81362059>] ? io_schedule+0x55/0x6b
[68145.180496]  [<ffffffff810bb42f>] ? sleep_on_page+0x6/0xa
[68145.180499]  [<ffffffff81361054>] ? __wait_on_bit+0x3e/0x71
[68145.180502]  [<ffffffff810bb577>] ? wait_on_page_bit+0x6e/0x73
[68145.180505]  [<ffffffff81052e69>] ? autoremove_wake_function+0x2a/0x2a
[68145.180540]  [<ffffffffa0199d18>] ? read_extent_buffer_pages+0x1fb/0x24b [btrfs]
[68145.180552]  [<ffffffffa0175112>] ? lock_page+0x11/0x11 [btrfs]
[68145.180564]  [<ffffffffa0176e69>] ? btree_read_extent_buffer_pages.constprop.110+0x5c/0xf3 [btrfs]
[68145.180577]  [<ffffffffa0177412>] ? read_tree_block+0x25/0x2d [btrfs]
[68145.180586]  [<ffffffffa0163087>] ? read_block_for_search.isra.32+0x2af/0x2e7 [btrfs]
[68145.180597]  [<ffffffffa01649c2>] ? btrfs_search_slot+0x48d/0x659 [btrfs]
[68145.180608]  [<ffffffffa0172d7d>] ? btrfs_lookup_csum+0x66/0x106 [btrfs]
[68145.180611]  [<ffffffff8136583e>] ? sub_preempt_count+0x83/0x94
[68145.180622]  [<ffffffffa0172fad>] ? __btrfs_lookup_bio_sums+0x190/0x30a [btrfs]
[68145.180635]  [<ffffffffa017d021>] ? btrfs_submit_bio_hook+0xa9/0x12a [btrfs]
[68145.180649]  [<ffffffffa0194af3>] ? submit_one_bio+0x82/0xb9 [btrfs]
[68145.180662]  [<ffffffffa01977c7>] ? submit_extent_page.isra.26+0x10a/0x1b2 [btrfs]
[68145.180675]  [<ffffffffa0196d91>] ? repair_io_failure+0x18e/0x18e [btrfs]
[68145.180687]  [<ffffffffa0197cf5>] ? __extent_read_full_page+0x424/0x4be [btrfs]
[68145.180699]  [<ffffffffa0196d91>] ? repair_io_failure+0x18e/0x18e [btrfs]
[68145.180712]  [<ffffffffa017eca4>] ? btrfs_writepage+0x4b/0x4b [btrfs]
[68145.180716]  [<ffffffff810c56b3>] ? __lru_cache_add+0x7b/0x8d
[68145.180728]  [<ffffffffa01988e8>] ? extent_readpages+0xaf/0xf3 [btrfs]
[68145.180741]  [<ffffffffa017eca4>] ? btrfs_writepage+0x4b/0x4b [btrfs]
[68145.180744]  [<ffffffff810c4640>] ? __do_page_cache_readahead+0x139/0x1de
[68145.180747]  [<ffffffff810c493d>] ? ra_submit+0x19/0x1d
[68145.180749]  [<ffffffff810bca23>] ? generic_file_aio_read+0x2b0/0x5d3
[68145.180753]  [<ffffffff81110627>] ? set_fd_set+0x23/0x26
[68145.180756]  [<ffffffff8110252a>] ? do_sync_read+0xab/0xe3
[68145.180759]  [<ffffffff81362b73>] ? _raw_spin_unlock_irqrestore+0x30/0x3e
[68145.180762]  [<ffffffff810595b4>] ? __wake_up+0x35/0x46
[68145.180764]  [<ffffffff81102c22>] ? vfs_read+0x9f/0xe6
[68145.180767]  [<ffffffff81103c2d>] ? fget_light+0x33/0x8d
[68145.180769]  [<ffffffff81102cae>] ? sys_read+0x45/0x6b
[68145.180772]  [<ffffffff8136723d>] ? system_call_fastpath+0x1a/0x1f
[68145.180783] Sched Debug Version: v0.10, 3.4.4-amd64-preempt-noide-20120410 #1
[68145.180785] ktime                                   : 68319835.436522
[68145.180787] sched_clk                               : 68145180.782078
[68145.180788] cpu_clk                                 : 68145180.782141
[68145.180790] jiffies                                 : 4311972254
[68145.180791] sched_clock_stable                      : 1
[68145.180792] 
[68145.180793] sysctl_sched
[68145.180795]   .sysctl_sched_latency                    : 18.000000
[68145.180796]   .sysctl_sched_min_granularity            : 2.250000
[68145.180798]   .sysctl_sched_wakeup_granularity         : 3.000000
[68145.180799]   .sysctl_sched_child_runs_first           : 0
[68145.180801]   .sysctl_sched_features                   : 24119
[68145.180803]   .sysctl_sched_tunable_scaling            : 1 (logaritmic)
[68145.180805] 
[68145.180805] cpu#0, 2893.340 MHz
[68145.180806]   .nr_running                    : 1
[68145.180808]   .load                          : 1024
[68145.180809]   .nr_switches                   : 220345791
[68145.180810]   .nr_load_updates               : 21415127
[68145.180812]   .nr_uninterruptible            : -4403
[68145.180813]   .next_balance                  : 4311.972255
[68145.180815]   .curr->pid                     : 8408
[68145.180816]   .clock                         : 68145180.058636
[68145.180818]   .cpu_load[0]                   : 0
[68145.180819]   .cpu_load[1]                   : 664
[68145.180820]   .cpu_load[2]                   : 639
[68145.180821]   .cpu_load[3]                   : 486
[68145.180823]   .cpu_load[4]                   : 397
[68145.180825] 
[68145.180825] cfs_rq[0]:/autogroup-105
[68145.180827]   .exec_clock                    : 0.000000
[68145.180828]   .MIN_vruntime                  : 0.000001
[68145.180830]   .min_vruntime                  : 632596.935417
[68145.180832]   .max_vruntime                  : 0.000001
[68145.180833]   .spread                        : 0.000000
[68145.180835]   .spread0                       : -16307488.857263
[68145.180836]   .nr_spread_over                : 0
[68145.180837]   .nr_running                    : 0
[68145.180838]   .load                          : 0
[68145.180840]   .load_avg                      : 2629.999737
[68145.180841]   .load_period                   : 5.116983
[68145.180843]   .load_contrib                  : 513
[68145.180844]   .load_tg                       : 515
[68145.180846]   .se->exec_start                : 68145177.126165
[68145.180847]   .se->vruntime                  : 16940076.867799
[68145.180849]   .se->sum_exec_runtime          : 525985.491293
[68145.180850]   .se->load.weight               : 2
[68145.180852] 
[68145.180852] cfs_rq[0]:/autogroup-728
[68145.180854]   .exec_clock                    : 0.000000
[68145.180855]   .MIN_vruntime                  : 0.000001
[68145.180857]   .min_vruntime                  : 1465.002075
[68145.180858]   .max_vruntime                  : 0.000001
[68145.180859]   .spread                        : 0.000000
[68145.180861]   .spread0                       : -16938620.790605
[68145.180862]   .nr_spread_over                : 0
[68145.180864]   .nr_running                    : 0
[68145.180865]   .load                          : 0
[68145.180866]   .load_avg                      : 718.614822
[68145.180868]   .load_period                   : 5.560256
[68145.180869]   .load_contrib                  : 121
[68145.180870]   .load_tg                       : 643
[68145.180872]   .se->exec_start                : 68145179.581565
[68145.180873]   .se->vruntime                  : 16940077.021084
[68145.180875]   .se->sum_exec_runtime          : 238.261122
[68145.180876]   .se->load.weight               : 2
[68145.180878] 
[68145.180878] cfs_rq[0]:/autogroup-77
[68145.180880]   .exec_clock                    : 0.000000
[68145.180881]   .MIN_vruntime                  : 0.000001
[68145.180883]   .min_vruntime                  : 7008.558906
[68145.180884]   .max_vruntime                  : 0.000001
[68145.180886]   .spread                        : 0.000000
[68145.180887]   .spread0                       : -16933077.233774
[68145.180888]   .nr_spread_over                : 0
[68145.180890]   .nr_running                    : 0
[68145.180891]   .load                          : 0
[68145.180892]   .load_avg                      : 639.999936
[68145.180894]   .load_period                   : 6.666654
[68145.180895]   .load_contrib                  : 96
[68145.180896]   .load_tg                       : 202
[68145.180898]   .se->exec_start                : 68145155.292812
[68145.180899]   .se->vruntime                  : 16940072.397295
[68145.180901]   .se->sum_exec_runtime          : 6875.737275
[68145.180902]   .se->load.weight               : 2
[68145.180904] 
[68145.180904] cfs_rq[0]:/autogroup-74
[68145.180906]   .exec_clock                    : 0.000000
[68145.180907]   .MIN_vruntime                  : 0.000001
[68145.180908]   .min_vruntime                  : 9096.628700
[68145.180910]   .max_vruntime                  : 0.000001
[68145.180911]   .spread                        : 0.000000
[68145.180913]   .spread0                       : -16930989.163980
[68145.180914]   .nr_spread_over                : 0
[68145.180915]   .nr_running                    : 0
[68145.180917]   .load                          : 0
[68145.180918]   .load_avg                      : 643.893952
[68145.180919]   .load_period                   : 8.513359
[68145.180921]   .load_contrib                  : 75
[68145.180922]   .load_tg                       : 75
[68145.180924]   .se->exec_start                : 68145155.254096
[68145.180926]   .se->vruntime                  : 16940072.391504
[68145.180927]   .se->sum_exec_runtime          : 8917.069991
[68145.180928]   .se->load.weight               : 2
[68145.180930] 
[68145.180930] cfs_rq[0]:/autogroup-85
[68145.180932]   .exec_clock                    : 0.000000
[68145.180933]   .MIN_vruntime                  : 0.000001
[68145.180935]   .min_vruntime                  : 461037.358394
[68145.180936]   .max_vruntime                  : 0.000001
[68145.180938]   .spread                        : 0.000000
[68145.180939]   .spread0                       : -16479048.434286
[68145.180940]   .nr_spread_over                : 0
[68145.180942]   .nr_running                    : 0
[68145.180943]   .load                          : 0
[68145.180944]   .load_avg                      : 159.999984
[68145.180946]   .load_period                   : 8.582743
[68145.180947]   .load_contrib                  : 18
[68145.180948]   .load_tg                       : 20
[68145.180950]   .se->exec_start                : 68145144.752282
[68145.180951]   .se->vruntime                  : 16940070.960978
[68145.180953]   .se->sum_exec_runtime          : 523059.429556
[68145.180954]   .se->load.weight               : 2
[68145.180956] 
[68145.180956] cfs_rq[0]:/autogroup-99
[68145.180958]   .exec_clock                    : 0.000000
[68145.180959]   .MIN_vruntime                  : 0.000001
[68145.180961]   .min_vruntime                  : 919853.728128
[68145.180962]   .max_vruntime                  : 0.000001
[68145.180963]   .spread                        : 0.000000
[68145.180965]   .spread0                       : -16020232.064552
[68145.180966]   .nr_spread_over                : 0
[68145.180968]   .nr_running                    : 0
[68145.180969]   .load                          : 0
[68145.180970]   .load_avg                      : 95.570772
[68145.180972]   .load_period                   : 8.631443
[68145.180973]   .load_contrib                  : 11
[68145.180974]   .load_tg                       : 25
[68145.180976]   .se->exec_start                : 68145166.483609
[68145.180977]   .se->vruntime                  : 16940075.698106
[68145.180979]   .se->sum_exec_runtime          : 649839.065400
[68145.180980]   .se->load.weight               : 2
[68145.180982] 
[68145.180982] cfs_rq[0]:/autogroup-53
[68145.180984]   .exec_clock                    : 0.000000
[68145.180985]   .MIN_vruntime                  : 0.000001
[68145.180987]   .min_vruntime                  : 8270588.687589
[68145.180988]   .max_vruntime                  : 0.000001
[68145.180989]   .spread                        : 0.000000
[68145.180991]   .spread0                       : -8669497.105091
[68145.180992]   .nr_spread_over                : 0
[68145.180993]   .nr_running                    : 0
[68145.180995]   .load                          : 0
[68145.180996]   .load_avg                      : 51.067853
[68145.180998]   .load_period                   : 6.626155
[68145.180999]   .load_contrib                  : 2
[68145.181000]   .load_tg                       : 1479
[68145.181002]   .se->exec_start                : 68145177.453635
[68145.181003]   .se->vruntime                  : 16940076.966610
[68145.181005]   .se->sum_exec_runtime          : 7731181.203365
[68145.181006]   .se->load.weight               : 2
[68145.181008] 
[68145.181008] cfs_rq[0]:/autogroup-216
[68145.181009]   .exec_clock                    : 0.000000
[68145.181011]   .MIN_vruntime                  : 0.000001
[68145.181012]   .min_vruntime                  : 106577.197911
[68145.181014]   .max_vruntime                  : 0.000001
[68145.181015]   .spread                        : 0.000000
[68145.181017]   .spread0                       : -16833508.594769
[68145.181018]   .nr_spread_over                : 0
[68145.181019]   .nr_running                    : 0
[68145.181020]   .load                          : 0
[68145.181022]   .load_avg                      : 22.437006
[68145.181023]   .load_period                   : 6.581971
[68145.181025]   .load_contrib                  : 3
[68145.181026]   .load_tg                       : 3
[68145.181028]   .se->exec_start                : 68145151.713745
[68145.181029]   .se->vruntime                  : 16940072.220775
[68145.181031]   .se->sum_exec_runtime          : 55613.402140
[68145.181032]   .se->load.weight               : 2
[68145.181034] 
[68145.181034] cfs_rq[0]:/autogroup-499
[68145.181036]   .exec_clock                    : 0.000000
[68145.181037]   .MIN_vruntime                  : 0.000001
[68145.181039]   .min_vruntime                  : 12143.287893
[68145.181040]   .max_vruntime                  : 0.000001
[68145.181042]   .spread                        : 0.000000
[68145.181043]   .spread0                       : -16927942.504787
[68145.181045]   .nr_spread_over                : 0
[68145.181046]   .nr_running                    : 0
[68145.181047]   .load                          : 0
[68145.181048]   .load_avg                      : 2288.736922
[68145.181050]   .load_period                   : 7.063528
[68145.181051]   .load_contrib                  : 324
[68145.181053]   .load_tg                       : 324
[68145.181054]   .se->exec_start                : 68145176.734792
[68145.181056]   .se->vruntime                  : 16940085.859991
[68145.181057]   .se->sum_exec_runtime          : 5891.241766
[68145.181058]   .se->load.weight               : 2
[68145.181060] 
[68145.181060] cfs_rq[0]:/autogroup-345
[68145.181062]   .exec_clock                    : 0.000000
[68145.181063]   .MIN_vruntime                  : 0.000001
[68145.181065]   .min_vruntime                  : 337573.928223
[68145.181066]   .max_vruntime                  : 0.000001
[68145.181067]   .spread                        : 0.000000
[68145.181069]   .spread0                       : -16602511.864457
[68145.181070]   .nr_spread_over                : 0
[68145.181072]   .nr_running                    : 0
[68145.181073]   .load                          : 0
[68145.181074]   .load_avg                      : 694.208403
[68145.181076]   .load_period                   : 6.188356
[68145.181077]   .load_contrib                  : 112
[68145.181078]   .load_tg                       : 511
[68145.181080]   .se->exec_start                : 68145174.673610
[68145.181081]   .se->vruntime                  : 16940076.816044
[68145.181083]   .se->sum_exec_runtime          : 453037.061998
[68145.181084]   .se->load.weight               : 2
[68145.181086] 
[68145.181086] cfs_rq[0]:/
[68145.181088]   .exec_clock                    : 0.000000
[68145.181089]   .MIN_vruntime                  : 0.000001
[68145.181090]   .min_vruntime                  : 16940085.792680
[68145.181092]   .max_vruntime                  : 0.000001
[68145.181093]   .spread                        : 0.000000
[68145.181095]   .spread0                       : 0.000000
[68145.181096]   .nr_spread_over                : 0
[68145.181097]   .nr_running                    : 1
[68145.181098]   .load                          : 1024
[68145.181100]   .load_avg                      : 0.000000
[68145.181101]   .load_period                   : 0.000000
[68145.181102]   .load_contrib                  : 0
[68145.181104]   .load_tg                       : 0
[68145.181105] 
[68145.181105] rt_rq[0]:
[68145.181107]   .rt_nr_running                 : 0
[68145.181108]   .rt_throttled                  : 0
[68145.181109]   .rt_time                       : 0.005109
[68145.181111]   .rt_runtime                    : 950.000000
[68145.181112] 
[68145.181113] runnable tasks:
[68145.181113]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[68145.181114] ----------------------------------------------------------------------------------------------------------
[68145.181136] R    kworker/0:1  8408  16940085.792680     29268   120               0               0               0.000000               0.000000               0.000000 /
[68145.181144] 
[68145.181144] cpu#1, 2893.340 MHz
[68145.181146]   .nr_running                    : 1
[68145.181147]   .load                          : 832
[68145.181148]   .nr_switches                   : 215554245
[68145.181150]   .nr_load_updates               : 7918869
[68145.181151]   .nr_uninterruptible            : 2380
[68145.181152]   .next_balance                  : 4311.972282
[68145.181154]   .curr->pid                     : 8880
[68145.181155]   .clock                         : 68145180.268053
[68145.181157]   .cpu_load[0]                   : 879
[68145.181158]   .cpu_load[1]                   : 706
[68145.181159]   .cpu_load[2]                   : 505
[68145.181160]   .cpu_load[3]                   : 322
[68145.181162]   .cpu_load[4]                   : 232
[68145.181163] 
[68145.181164] cfs_rq[1]:/autogroup-53
[68145.181165]   .exec_clock                    : 0.000000
[68145.181166]   .MIN_vruntime                  : 0.000001
[68145.181168]   .min_vruntime                  : 10319271.413405
[68145.181169]   .max_vruntime                  : 0.000001
[68145.181171]   .spread                        : 0.000000
[68145.181172]   .spread0                       : -6620814.379275
[68145.181174]   .nr_spread_over                : 0
[68145.181175]   .nr_running                    : 0
[68145.181176]   .load                          : 0
[68145.181177]   .load_avg                      : 2668.957401
[68145.181179]   .load_period                   : 6.733473
[68145.181180]   .load_contrib                  : 430
[68145.181181]   .load_tg                       : 1479
[68145.181183]   .se->exec_start                : 68145180.194582
[68145.181184]   .se->vruntime                  : 19124295.713641
[68145.181186]   .se->sum_exec_runtime          : 9934376.153325
[68145.181187]   .se->load.weight               : 2
[68145.181189] 
[68145.181189] cfs_rq[1]:/autogroup-728
[68145.181191]   .exec_clock                    : 0.000000
[68145.181192]   .MIN_vruntime                  : 0.000001
[68145.181193]   .min_vruntime                  : 805.626907
[68145.181195]   .max_vruntime                  : 0.000001
[68145.181196]   .spread                        : 0.000000
[68145.181198]   .spread0                       : -16939280.165773
[68145.181199]   .nr_spread_over                : 0
[68145.181200]   .nr_running                    : 1
[68145.181202]   .load                          : 526
[68145.181203]   .load_avg                      : 4387.132719
[68145.181204]   .load_period                   : 8.401223
[68145.181206]   .load_contrib                  : 522
[68145.181207]   .load_tg                       : 643
[68145.181209]   .se->exec_start                : 68145180.268053
[68145.181210]   .se->vruntime                  : 19124304.586462
[68145.181212]   .se->sum_exec_runtime          : 174.761132
[68145.181213]   .se->load.weight               : 832
[68145.181214] 
[68145.181215] cfs_rq[1]:/autogroup-77
[68145.181216]   .exec_clock                    : 0.000000
[68145.181217]   .MIN_vruntime                  : 0.000001
[68145.181219]   .min_vruntime                  : 12354.484454
[68145.181220]   .max_vruntime                  : 0.000001
[68145.181222]   .spread                        : 0.000000
[68145.181223]   .spread0                       : -16927731.308226
[68145.181225]   .nr_spread_over                : 0
[68145.181226]   .nr_running                    : 0
[68145.181227]   .load                          : 0
[68145.181228]   .load_avg                      : 639.999936
[68145.181230]   .load_period                   : 5.988008
[68145.181231]   .load_contrib                  : 106
[68145.181232]   .load_tg                       : 202
[68145.181234]   .se->exec_start                : 68145155.131954
[68145.181235]   .se->vruntime                  : 19124277.088025
[68145.181237]   .se->sum_exec_runtime          : 12258.830758
[68145.181238]   .se->load.weight               : 2
[68145.181240] 
[68145.181240] cfs_rq[1]:/autogroup-13
[68145.181242]   .exec_clock                    : 0.000000
[68145.181243]   .MIN_vruntime                  : 0.000001
[68145.181245]   .min_vruntime                  : 7268.331092
[68145.181246]   .max_vruntime                  : 0.000001
[68145.181247]   .spread                        : 0.000000
[68145.181249]   .spread0                       : -16932817.461588
[68145.181250]   .nr_spread_over                : 0
[68145.181251]   .nr_running                    : 0
[68145.181253]   .load                          : 0
[68145.181254]   .load_avg                      : 324.714656
[68145.181255]   .load_period                   : 5.489619
[68145.181257]   .load_contrib                  : 59
[68145.181258]   .load_tg                       : 59
[68145.181259]   .se->exec_start                : 68145150.974036
[68145.181261]   .se->vruntime                  : 19124277.210880
[68145.181262]   .se->sum_exec_runtime          : 3628.001811
[68145.181264]   .se->load.weight               : 2
[68145.181265] 
[68145.181266] cfs_rq[1]:/autogroup-52
[68145.181267]   .exec_clock                    : 0.000000
[68145.181268]   .MIN_vruntime                  : 0.000001
[68145.181270]   .min_vruntime                  : 244250.987333
[68145.181271]   .max_vruntime                  : 0.000001
[68145.181273]   .spread                        : 0.000000
[68145.181274]   .spread0                       : -16695834.805347
[68145.181275]   .nr_spread_over                : 0
[68145.181277]   .nr_running                    : 0
[68145.181278]   .load                          : 0
[68145.181279]   .load_avg                      : 339.904928
[68145.181281]   .load_period                   : 5.831847
[68145.181282]   .load_contrib                  : 58
[68145.181283]   .load_tg                       : 58
[68145.181285]   .se->exec_start                : 68145155.473960
[68145.181287]   .se->vruntime                  : 19124277.987459
[68145.181288]   .se->sum_exec_runtime          : 244497.514450
[68145.181289]   .se->load.weight               : 2
[68145.181291] 
[68145.181292] cfs_rq[1]:/autogroup-751
[68145.181293]   .exec_clock                    : 0.000000
[68145.181294]   .MIN_vruntime                  : 0.000001
[68145.181296]   .min_vruntime                  : 52.906532
[68145.181297]   .max_vruntime                  : 0.000001
[68145.181299]   .spread                        : 0.000000
[68145.181300]   .spread0                       : -16940032.886148
[68145.181301]   .nr_spread_over                : 0
[68145.181303]   .nr_running                    : 0
[68145.181304]   .load                          : 0
[68145.181305]   .load_avg                      : 329.284768
[68145.181307]   .load_period                   : 5.489338
[68145.181308]   .load_contrib                  : 59
[68145.181309]   .load_tg                       : 59
[68145.181311]   .se->exec_start                : 68145151.059469
[68145.181312]   .se->vruntime                  : 19124277.756132
[68145.181314]   .se->sum_exec_runtime          : 44.955108
[68145.181315]   .se->load.weight               : 2
[68145.181316] 
[68145.181317] cfs_rq[1]:/autogroup-105
[68145.181318]   .exec_clock                    : 0.000000
[68145.181320]   .MIN_vruntime                  : 0.000001
[68145.181321]   .min_vruntime                  : 586685.915590
[68145.181323]   .max_vruntime                  : 0.000001
[68145.181324]   .spread                        : 0.000000
[68145.181326]   .spread0                       : -16353399.877090
[68145.181327]   .nr_spread_over                : 0
[68145.181328]   .nr_running                    : 0
[68145.181329]   .load                          : 0
[68145.181331]   .load_avg                      : 13.583718
[68145.181332]   .load_period                   : 5.192153
[68145.181333]   .load_contrib                  : 2
[68145.181335]   .load_tg                       : 515
[68145.181336]   .se->exec_start                : 68145167.025810
[68145.181338]   .se->vruntime                  : 19124277.682220
[68145.181339]   .se->sum_exec_runtime          : 451469.872335
[68145.181341]   .se->load.weight               : 2
[68145.181342] 
[68145.181342] cfs_rq[1]:/autogroup-85
[68145.181344]   .exec_clock                    : 0.000000
[68145.181345]   .MIN_vruntime                  : 0.000001
[68145.181347]   .min_vruntime                  : 457615.974873
[68145.181348]   .max_vruntime                  : 0.000001
[68145.181350]   .spread                        : 0.000000
[68145.181351]   .spread0                       : -16482469.817807
[68145.181353]   .nr_spread_over                : 0
[68145.181354]   .nr_running                    : 0
[68145.181355]   .load                          : 0
[68145.181357]   .load_avg                      : 22.432540
[68145.181358]   .load_period                   : 9.882693
[68145.181360]   .load_contrib                  : 2
[68145.181361]   .load_tg                       : 14
[68145.181362]   .se->exec_start                : 68145166.823349
[68145.181364]   .se->vruntime                  : 19124277.441135
[68145.181366]   .se->sum_exec_runtime          : 496686.043300
[68145.181367]   .se->load.weight               : 2
[68145.181368] 
[68145.181369] cfs_rq[1]:/autogroup-99
[68145.181370]   .exec_clock                    : 0.000000
[68145.181372]   .MIN_vruntime                  : 0.000001
[68145.181373]   .min_vruntime                  : 886895.365957
[68145.181374]   .max_vruntime                  : 0.000001
[68145.181376]   .spread                        : 0.000000
[68145.181377]   .spread0                       : -16053191.714824
[68145.181379]   .nr_spread_over                : 0
[68145.181380]   .nr_running                    : 0
[68145.181381]   .load                          : 0
[68145.181382]   .load_avg                      : 19.102799
[68145.181384]   .load_period                   : 6.184140
[68145.181385]   .load_contrib                  : 3
[68145.181386]   .load_tg                       : 20
[68145.181388]   .se->exec_start                : 68145174.705985
[68145.181389]   .se->vruntime                  : 19124286.557447
[68145.181391]   .se->sum_exec_runtime          : 563791.520677
[68145.181392]   .se->load.weight               : 2
[68145.181394] 
[68145.181394] cfs_rq[1]:/autogroup-345
[68145.181395]   .exec_clock                    : 0.000000
[68145.181397]   .MIN_vruntime                  : 0.000001
[68145.181398]   .min_vruntime                  : 390277.323389
[68145.181400]   .max_vruntime                  : 0.000001
[68145.181401]   .spread                        : 0.000000
[68145.181402]   .spread0                       : -16549809.757392
[68145.181404]   .nr_spread_over                : 0
[68145.181405]   .nr_running                    : 0
[68145.181406]   .load                          : 0
[68145.181408]   .load_avg                      : 3.038718
[68145.181409]   .load_period                   : 9.886871
[68145.181410]   .load_contrib                  : 0
[68145.181412]   .load_tg                       : 431
[68145.181413]   .se->exec_start                : 68145154.709399
[68145.181415]   .se->vruntime                  : 19124277.152141
[68145.181416]   .se->sum_exec_runtime          : 488635.600464
[68145.181418]   .se->load.weight               : 2
[68145.181419] 
[68145.181419] cfs_rq[1]:/
[68145.181420]   .exec_clock                    : 0.000000
[68145.181422]   .MIN_vruntime                  : 0.000001
[68145.181423]   .min_vruntime                  : 19124304.595285
[68145.181425]   .max_vruntime                  : 0.000001
[68145.181426]   .spread                        : 0.000000
[68145.181428]   .spread0                       : 2184217.514504
[68145.181429]   .nr_spread_over                : 0
[68145.181430]   .nr_running                    : 1
[68145.181432]   .load                          : 832
[68145.181433]   .load_avg                      : 0.000000
[68145.181434]   .load_period                   : 0.000000
[68145.181436]   .load_contrib                  : 0
[68145.181437]   .load_tg                       : 0
[68145.181438] 
[68145.181438] rt_rq[1]:
[68145.181439]   .rt_nr_running                 : 0
[68145.181441]   .rt_throttled                  : 0
[68145.181442]   .rt_time                       : 0.002810
[68145.181443]   .rt_runtime                    : 950.000000
[68145.181445] 
[68145.181446] runnable tasks:
[68145.181446]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[68145.181447] ----------------------------------------------------------------------------------------------------------
[68145.181467] R           bash  8880       804.468551      1051   123               0               0               0.000000               0.000000               0.000000 /autogroup-728
[68145.181474] 
[68145.181474] cpu#2, 2893.340 MHz
[68145.181475]   .nr_running                    : 1
[68145.181477]   .load                          : 1006
[68145.181478]   .nr_switches                   : 281539588
[68145.181479]   .nr_load_updates               : 5389825
[68145.181481]   .nr_uninterruptible            : 4408
[68145.181482]   .next_balance                  : 4311.972269
[68145.181484]   .curr->pid                     : 8272
[68145.181485]   .clock                         : 68145181.347401
[68145.181486]   .cpu_load[0]                   : 1006
[68145.181488]   .cpu_load[1]                   : 1003
[68145.181489]   .cpu_load[2]                   : 979
[68145.181490]   .cpu_load[3]                   : 940
[68145.181491]   .cpu_load[4]                   : 926
[68145.181493] 
[68145.181493] cfs_rq[2]:/autogroup-53
[68145.181495]   .exec_clock                    : 0.000000
[68145.181496]   .MIN_vruntime                  : 0.000001
[68145.181498]   .min_vruntime                  : 14748239.445043
[68145.181499]   .max_vruntime                  : 0.000001
[68145.181500]   .spread                        : 0.000000
[68145.181502]   .spread0                       : -2191847.635738
[68145.181503]   .nr_spread_over                : 0
[68145.181505]   .nr_running                    : 1
[68145.181506]   .load                          : 1024
[68145.181507]   .load_avg                      : 8875.504327
[68145.181509]   .load_period                   : 8.171060
[68145.181510]   .load_contrib                  : 1043
[68145.181511]   .load_tg                       : 1480
[68145.181513]   .se->exec_start                : 68145181.347401
[68145.181514]   .se->vruntime                  : 23834838.341563
[68145.181516]   .se->sum_exec_runtime          : 14308302.530944
[68145.181517]   .se->load.weight               : 1006
[68145.181518] 
[68145.181519] cfs_rq[2]:/
[68145.181520]   .exec_clock                    : 0.000000
[68145.181522]   .MIN_vruntime                  : 0.000001
[68145.181523]   .min_vruntime                  : 23834838.341563
[68145.181525]   .max_vruntime                  : 0.000001
[68145.181526]   .spread                        : 0.000000
[68145.181527]   .spread0                       : 6894751.260782
[68145.181529]   .nr_spread_over                : 0
[68145.181530]   .nr_running                    : 1
[68145.181531]   .load                          : 1006
[68145.181533]   .load_avg                      : 0.000000
[68145.181534]   .load_period                   : 0.000000
[68145.181535]   .load_contrib                  : 0
[68145.181536]   .load_tg                       : 0
[68145.181538] 
[68145.181538] rt_rq[2]:
[68145.181539]   .rt_nr_running                 : 0
[68145.181540]   .rt_throttled                  : 0
[68145.181542]   .rt_time                       : 0.002557
[68145.181543]   .rt_runtime                    : 950.000000
[68145.181545] 
[68145.181545] runnable tasks:
[68145.181546]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[68145.181547] ----------------------------------------------------------------------------------------------------------
[68145.181556] Rplugin-containe  8272  14748239.445043  94987642   120               0               0               0.000000               0.000000               0.000000 /autogroup-53
[68145.181571] 
[68145.181571] cpu#3, 2893.340 MHz
[68145.181573]   .nr_running                    : 0
[68145.181574]   .load                          : 0
[68145.181575]   .nr_switches                   : 284183655
[68145.181576]   .nr_load_updates               : 5361929
[68145.181578]   .nr_uninterruptible            : -2384
[68145.181579]   .next_balance                  : 4311.972267
[68145.181581]   .curr->pid                     : 0
[68145.181582]   .clock                         : 68145181.363470
[68145.181583]   .cpu_load[0]                   : 923
[68145.181584]   .cpu_load[1]                   : 466
[68145.181586]   .cpu_load[2]                   : 287
[68145.181587]   .cpu_load[3]                   : 227
[68145.181588]   .cpu_load[4]                   : 173
[68145.181590] 
[68145.181590] cfs_rq[3]:/autogroup-345
[68145.181591]   .exec_clock                    : 0.000000
[68145.181593]   .MIN_vruntime                  : 0.000001
[68145.181594]   .min_vruntime                  : 271574.282861
[68145.181596]   .max_vruntime                  : 0.000001
[68145.181597]   .spread                        : 0.000000
[68145.181598]   .spread0                       : -16668512.797920
[68145.181600]   .nr_spread_over                : 0
[68145.181601]   .nr_running                    : 0
[68145.181602]   .load                          : 0
[68145.181604]   .load_avg                      : 2929.355520
[68145.181605]   .load_period                   : 8.069625
[68145.181606]   .load_contrib                  : 363
[68145.181608]   .load_tg                       : 431
[68145.181609]   .se->exec_start                : 68145181.362481
[68145.181611]   .se->vruntime                  : 24275013.527067
[68145.181612]   .se->sum_exec_runtime          : 337338.760030
[68145.181614]   .se->load.weight               : 2
[68145.181615] 
[68145.181616] cfs_rq[3]:/autogroup-99
[68145.181617]   .exec_clock                    : 0.000000
[68145.181618]   .MIN_vruntime                  : 0.000001
[68145.181620]   .min_vruntime                  : 709630.528938
[68145.181621]   .max_vruntime                  : 0.000001
[68145.181623]   .spread                        : 0.000000
[68145.181624]   .spread0                       : -16230456.551843
[68145.181625]   .nr_spread_over                : 0
[68145.181627]   .nr_running                    : 0
[68145.181628]   .load                          : 0
[68145.181629]   .load_avg                      : 80.933248
[68145.181631]   .load_period                   : 7.593330
[68145.181632]   .load_contrib                  : 10
[68145.181633]   .load_tg                       : 20
[68145.181635]   .se->exec_start                : 68145172.520017
[68145.181636]   .se->vruntime                  : 24275013.536371
[68145.181638]   .se->sum_exec_runtime          : 455298.062310
[68145.181639]   .se->load.weight               : 2
[68145.181640] 
[68145.181641] cfs_rq[3]:/autogroup-53
[68145.181642]   .exec_clock                    : 0.000000
[68145.181643]   .MIN_vruntime                  : 0.000001
[68145.181645]   .min_vruntime                  : 15285191.917887
[68145.181646]   .max_vruntime                  : 0.000001
[68145.181648]   .spread                        : 0.000000
[68145.181649]   .spread0                       : -1654895.162894
[68145.181651]   .nr_spread_over                : 0
[68145.181652]   .nr_running                    : 0
[68145.181653]   .load                          : 0
[68145.181654]   .load_avg                      : 27.589206
[68145.181656]   .load_period                   : 7.598913
[68145.181657]   .load_contrib                  : 3
[68145.181658]   .load_tg                       : 1480
[68145.181660]   .se->exec_start                : 68145171.243075
[68145.181661]   .se->vruntime                  : 24275021.987757
[68145.181663]   .se->sum_exec_runtime          : 15157215.916175
[68145.181664]   .se->load.weight               : 2
[68145.181666] 
[68145.181666] cfs_rq[3]:/
[68145.181667]   .exec_clock                    : 0.000000
[68145.181669]   .MIN_vruntime                  : 0.000001
[68145.181670]   .min_vruntime                  : 24275021.987757
[68145.181672]   .max_vruntime                  : 0.000001
[68145.181673]   .spread                        : 0.000000
[68145.181674]   .spread0                       : 7334934.906976
[68145.181675]   .nr_spread_over                : 0
[68145.181676]   .nr_running                    : 0
[68145.181677]   .load                          : 0
[68145.181678]   .load_avg                      : 0.000000
[68145.181680]   .load_period                   : 0.000000
[68145.181681]   .load_contrib                  : 0
[68145.181683]   .load_tg                       : 0
[68145.181684] 
[68145.181685] rt_rq[3]:
[68145.181686]   .rt_nr_running                 : 0
[68145.181687]   .rt_throttled                  : 0
[68145.181689]   .rt_time                       : 0.004236
[68145.181691]   .rt_runtime                    : 950.000000
[68145.181693] 
[68145.181693] runnable tasks:
[68145.181694]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[68145.181695] ----------------------------------------------------------------------------------------------------------
[68145.181712] R  btrfs-endio-3 13855  24275012.987757    163715   120               0               0               0.000000               0.000000               0.000000 /
[68145.181719] 
[68151.449220] kworker/0:0: page allocation failure: order:1, mode:0x4020
[68151.449225] Pid: 9004, comm: kworker/0:0 Tainted: G         C O 3.4.4-amd64-preempt-noide-20120410 #1
[68151.449227] Call Trace:
[68151.449228]  <IRQ>  [<ffffffff810bf8d8>] ? warn_alloc_failed+0x11f/0x132
[68151.449239]  [<ffffffff810453ff>] ? __mod_timer+0x13a/0x14c
[68151.449243]  [<ffffffff810c26e2>] ? __alloc_pages_nodemask+0x72f/0x7df
[68151.449256]  [<ffffffff810ecf10>] ? alloc_pages_current+0xc7/0xe4
[68151.449266]  [<ffffffffa04ca247>] ? iwlagn_rx_allocate+0x97/0x24d [iwlwifi]
[68151.449273]  [<ffffffffa04cb27b>] ? iwl_irq_tasklet+0x6e4/0x838 [iwlwifi]
[68151.449279]  [<ffffffff8103f6e9>] ? tasklet_action+0x79/0xc8
[68151.449283]  [<ffffffff8103f205>] ? __do_softirq+0xc0/0x188
[68151.449288]  [<ffffffff8136879c>] ? call_softirq+0x1c/0x30
[68151.449293]  [<ffffffff8100fcf9>] ? do_softirq+0x3c/0x7b
[68151.449297]  [<ffffffff8103f4f3>] ? irq_exit+0x3d/0xa5
[68151.449301]  [<ffffffff8100fa1e>] ? do_IRQ+0x81/0x97
[68151.449306]  [<ffffffff81362e2a>] ? common_interrupt+0x6a/0x6a
[68151.449308]  <EOI>  [<ffffffff811b7bca>] ? idr_get_next+0x7d/0x92
[68151.449318]  [<ffffffff81088d74>] ? css_get_next+0x59/0x97
[68151.449322]  [<ffffffff810fc115>] ? mem_cgroup_iter+0x109/0x1ab
[68151.449328]  [<ffffffff810cb31a>] ? shrink_zone+0x89/0x9b
[68151.449333]  [<ffffffff810cb73c>] ? do_try_to_free_pages+0x1e4/0x434
[68151.449338]  [<ffffffff810cbc11>] ? try_to_free_pages+0xb3/0xf9
[68151.449343]  [<ffffffff810c24a2>] ? __alloc_pages_nodemask+0x4ef/0x7df
[68151.449349]  [<ffffffff810ecf10>] ? alloc_pages_current+0xc7/0xe4
[68151.449356]  [<ffffffffa04ca247>] ? iwlagn_rx_allocate+0x97/0x24d [iwlwifi]
[68151.449363]  [<ffffffffa04ca81e>] ? iwlagn_rx_replenish+0x3a/0x3a [iwlwifi]
[68151.449369]  [<ffffffffa04ca7fc>] ? iwlagn_rx_replenish+0x18/0x3a [iwlwifi]
[68151.449373]  [<ffffffff8104ea7d>] ? process_one_work+0x16d/0x298
[68151.449379]  [<ffffffff8104f4d9>] ? worker_thread+0xc2/0x145
[68151.449383]  [<ffffffff8104f417>] ? manage_workers.isra.23+0x15b/0x15b
[68151.449386]  [<ffffffff81052788>] ? kthread+0x7d/0x85
[68151.449390]  [<ffffffff813686a4>] ? kernel_thread_helper+0x4/0x10
[68151.449395]  [<ffffffff8105270b>] ? kthread_freezable_should_stop+0x37/0x37
[68151.449399]  [<ffffffff813686a0>] ? gs_change+0x13/0x13
[68151.449401] Mem-Info:
[68151.449403] Node 0 DMA per-cpu:
[68151.449406] CPU    0: hi:    0, btch:   1 usd:   0
[68151.449409] CPU    1: hi:    0, btch:   1 usd:   0
[68151.449411] CPU    2: hi:    0, btch:   1 usd:   0
[68151.449413] CPU    3: hi:    0, btch:   1 usd:   0
[68151.449415] Node 0 DMA32 per-cpu:
[68151.449418] CPU    0: hi:  186, btch:  31 usd:  40
[68151.449421] CPU    1: hi:  186, btch:  31 usd:   0
[68151.449423] CPU    2: hi:  186, btch:  31 usd:   0
[68151.449425] CPU    3: hi:  186, btch:  31 usd:   0
[68151.449427] Node 0 Normal per-cpu:
[68151.449430] CPU    0: hi:  186, btch:  31 usd:  34
[68151.449432] CPU    1: hi:  186, btch:  31 usd:  28
[68151.449434] CPU    2: hi:  186, btch:  31 usd:   0
[68151.449436] CPU    3: hi:  186, btch:  31 usd:   0
[68151.449441] active_anon:722200 inactive_anon:259222 isolated_anon:0
[68151.449442]  active_file:278984 inactive_file:316208 isolated_file:0
[68151.449443]  unevictable:1 dirty:180847 writeback:0 unstable:0
[68151.449444]  free:40350 slab_reclaimable:31197 slab_unreclaimable:22223
[68151.449446]  mapped:306850 shmem:33690 pagetables:19223 bounce:0
[68151.449449] Node 0 DMA free:15904kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15680kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[68151.449461] lowmem_reserve[]: 0 3257 7777 7777
[68151.449468] Node 0 DMA32 free:81536kB min:28252kB low:35312kB high:42376kB active_anon:1292064kB inactive_anon:514608kB active_file:593820kB inactive_file:686892kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3335900kB mlocked:0kB dirty:332952kB writeback:0kB mapped:185252kB shmem:48512kB slab_reclaimable:40220kB slab_unreclaimable:22240kB kernel_stack:1600kB pagetables:15604kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[68151.449480] lowmem_reserve[]: 0 0 4519 4519
[68151.449486] Node 0 Normal free:63960kB min:39196kB low:48992kB high:58792kB active_anon:1596736kB inactive_anon:522280kB active_file:522116kB inactive_file:577940kB unevictable:4kB isolated(anon):0kB isolated(file):0kB present:4627820kB mlocked:4kB dirty:390436kB writeback:0kB mapped:1042148kB shmem:86248kB slab_reclaimable:84568kB slab_unreclaimable:66652kB kernel_stack:4920kB pagetables:61288kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:34 all_unreclaimable? no
[68151.449500] lowmem_reserve[]: 0 0 0 0
[68151.449506] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15904kB
[68151.449521] Node 0 DMA32: 19072*4kB 12*8kB 0*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 81600kB
[68151.449537] Node 0 Normal: 14790*4kB 86*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 63944kB
[68151.449552] 664276 total pagecache pages
[68151.449555] 35263 pages in swap cache
[68151.449557] Swap cache stats: add 373242, delete 337979, find 73039/82302
[68151.449560] Free swap  = 3029760kB
[68151.449562] Total swap = 4106248kB
[68151.475328] 2057712 pages RAM
[68151.475330] 340431 pages reserved
[68151.475331] 589330 pages shared
[68151.475332] 1252827 pages non-shared
[68151.475337] iwlwifi 0000:03:00.0: Failed to alloc_pages with GFP_ATOMIC.Only 5 free buffers remaining.
[69042.086472] btrfs: unlinked 9 orphans
[69042.307617] btrfs: unlinked 78 orphans
[69042.434375] btrfs: unlinked 1 orphans
[69042.598993] btrfs: unlinked 27 orphans
[69647.593815] nfs: server gargamel not responding, still trying
[69647.593830] nfs: server gargamel not responding, still trying
[69647.593835] nfs: server gargamel not responding, still trying
[69647.593840] nfs: server gargamel not responding, still trying
[69647.593845] nfs: server gargamel not responding, still trying
(...)
[69658.194846] nfs: server gargamel not responding, still trying
[69658.194848] nfs: server gargamel not responding, still trying
[69658.194849] nfs: server gargamel not responding, still trying
[69671.113619] nfs: server gargamel OK
[69671.456875] nfs: server gargamel OK
[69673.105176] SysRq : Show Blocked State
[69673.105182]   task                        PC stack   pid father
[69673.105325] automount       D ffff88020e94e810     0 10363      1 0x00000080
[69673.105330]  ffff88020e94e810 0000000000000086 ffff88021142f0c0 ffff880002250140
[69673.105334]  0000000000013680 ffff8800099e3fd8 ffff8800099e3fd8 ffff88020e94e810
[69673.105338]  ffff88021142f0c0 7fffffffffffffff 7fffffffffffffff ffff8800099e3480
[69673.105342] Call Trace:
[69673.105350]  [<ffffffff81360e2b>] ? schedule_timeout+0x2c/0xdb
[69673.105355]  [<ffffffff8104e736>] ? __queue_work+0x22e/0x256
[69673.105360]  [<ffffffff8105dd51>] ? get_parent_ip+0x9/0x1b
[69673.105363]  [<ffffffff8136583e>] ? sub_preempt_count+0x83/0x94
[69673.105366]  [<ffffffff81361967>] ? wait_for_common+0x9a/0x112
[69673.105369]  [<ffffffff8105f46a>] ? try_to_wake_up+0x1b4/0x1b4
[69673.105372]  [<ffffffff8104c785>] ? call_usermodehelper_exec+0x9b/0xdd
[69673.105376]  [<ffffffff811679f6>] ? call_sbin_request_key+0x265/0x2d1
[69673.105385]  [<ffffffff81167d75>] ? request_key_and_link+0x313/0x39a
[69673.105389]  [<ffffffff810f3e0c>] ? ____cache_alloc+0x19/0x205
[69673.105393]  [<ffffffff81167eae>] ? request_key+0x37/0x69
[69673.105396]  [<ffffffff810570aa>] ? override_creds+0x1f/0x32
[69673.105411]  [<ffffffffa070196f>] ? nfs_idmap_request_key+0xe6/0x18b [nfs]
[69673.105422]  [<ffffffffa0701b25>] ? nfs_idmap_lookup_id+0x53/0xca [nfs]
[69673.105432]  [<ffffffffa06f8389>] ? decode_getfattr_attrs+0x8de/0xcf4 [nfs]
[69673.105436]  [<ffffffff81061521>] ? set_next_entity+0x32/0x52
[69673.105445]  [<ffffffffa06fdcb0>] ? nfs4_xdr_dec_lookup+0x71/0x71 [nfs]
[69673.105454]  [<ffffffffa06fd665>] ? decode_getfattr_generic.constprop.83+0x7f/0xcd [nfs]
[69673.105464]  [<ffffffffa06fdcb0>] ? nfs4_xdr_dec_lookup+0x71/0x71 [nfs]
[69673.105475]  [<ffffffffa06fdcf9>] ? nfs4_xdr_dec_getattr+0x49/0x50 [nfs]
[69673.105484]  [<ffffffffa069be59>] ? rpc_make_runnable+0x6a/0x6a [sunrpc]
[69673.105492]  [<ffffffffa069d7f3>] ? rpcauth_unwrap_resp+0x9b/0xa8 [sunrpc]
[69673.105498]  [<ffffffffa069be59>] ? rpc_make_runnable+0x6a/0x6a [sunrpc]
[69673.105504]  [<ffffffffa0693f70>] ? call_decode+0x673/0x708 [sunrpc]
[69673.105509]  [<ffffffffa06938fd>] ? call_bc_transmit+0x109/0x109 [sunrpc]
[69673.105514]  [<ffffffffa06938fd>] ? call_bc_transmit+0x109/0x109 [sunrpc]
[69673.105520]  [<ffffffffa069c6a1>] ? __rpc_execute+0xbe/0x320 [sunrpc]
[69673.105524]  [<ffffffff81052c0e>] ? wake_up_bit+0xd/0x1e
[69673.105530]  [<ffffffffa06954e7>] ? rpc_run_task+0x77/0x7e [sunrpc]
[69673.105535]  [<ffffffffa06955dd>] ? rpc_call_sync+0x42/0x62 [sunrpc]
[69673.105545]  [<ffffffffa06ef1d4>] ? _nfs4_proc_getattr+0x8b/0x94 [nfs]
[69673.105555]  [<ffffffffa06f2629>] ? nfs4_proc_getattr+0x2b/0x48 [nfs]
[69673.105562]  [<ffffffffa06de0f5>] ? __nfs_revalidate_inode+0xab/0x194 [nfs]
[69673.105569]  [<ffffffffa06d94c3>] ? nfs_check_verifier+0x42/0x60 [nfs]
[69673.105575]  [<ffffffffa06d9607>] ? nfs_lookup_revalidate+0xf0/0x353 [nfs]
[69673.105582]  [<ffffffffa06db091>] ? nfs_open_revalidate+0x1bc/0x1cb [nfs]
[69673.105585]  [<ffffffff8136583e>] ? sub_preempt_count+0x83/0x94
[69673.105588]  [<ffffffff81362b37>] ? _raw_spin_unlock+0x25/0x31
[69673.105592]  [<ffffffff81113e7f>] ? __d_lookup+0xd1/0xe3
[69673.105594]  [<ffffffff8110bf3f>] ? inode_permission+0x65/0xda
[69673.105597]  [<ffffffff8110ba37>] ? walk_component+0x202/0x3c8
[69673.105601]  [<ffffffff811be7eb>] ? put_dec+0x2e/0x33
[69673.105604]  [<ffffffff8110c81a>] ? path_lookupat+0x7c/0x29b
[69673.105608]  [<ffffffff810fcc74>] ? __mem_cgroup_uncharge_common+0x100/0x1fd
[69673.105612]  [<ffffffff810c0392>] ? get_pageblock_flags_group+0x3a/0x6e
[69673.105616]  [<ffffffffa07bfb60>] ? find_autofs_mount+0x9f/0x9f [autofs4]
[69673.105619]  [<ffffffff8110ca55>] ? do_path_lookup+0x1c/0x87
[69673.105622]  [<ffffffffa07bfb60>] ? find_autofs_mount+0x9f/0x9f [autofs4]
[69673.105625]  [<ffffffff8110cc79>] ? kern_path+0x1d/0x3a
[69673.105629]  [<ffffffffa07c0020>] ? _autofs_dev_ioctl+0x82/0x32e [autofs4]
[69673.105632]  [<ffffffff810f49aa>] ? __kmalloc_track_caller+0xf6/0x108
[69673.105636]  [<ffffffffa07bfbb5>] ? autofs_dev_ioctl_ismountpoint+0x55/0x137 [autofs4]
[69673.105640]  [<ffffffffa07c0239>] ? _autofs_dev_ioctl+0x29b/0x32e [autofs4]
[69673.105644]  [<ffffffffa07c02d7>] ? autofs_dev_ioctl+0xb/0xf [autofs4]
[69673.105647]  [<ffffffff81110091>] ? do_vfs_ioctl+0x403/0x444
[69673.105650]  [<ffffffff810dcecf>] ? do_munmap+0x2da/0x2f3
[69673.105653]  [<ffffffff8111011d>] ? sys_ioctl+0x4b/0x72
[69673.105657]  [<ffffffff8136723d>] ? system_call_fastpath+0x1a/0x1f
[69673.105666] mc              D ffff8802134ff750     0  6526  21734 0x00000080
[69673.105670]  ffff8802134ff750 0000000000000086 0000000000000008 ffff880175d69890
[69673.105673]  0000000000013680 ffff88001a98dfd8 ffff88001a98dfd8 ffff8802134ff750
[69673.105677]  ffff8801a299ede0 ffff88021e213680 ffff8802134ff750 ffffffff810bb429
[69673.105681] Call Trace:
[69673.105684]  [<ffffffff810bb429>] ? __lock_page+0x66/0x66
[69673.105687]  [<ffffffff81362059>] ? io_schedule+0x55/0x6b
[69673.105690]  [<ffffffff810bb42f>] ? sleep_on_page+0x6/0xa
[69673.105692]  [<ffffffff81361054>] ? __wait_on_bit+0x3e/0x71
[69673.105696]  [<ffffffff810bb577>] ? wait_on_page_bit+0x6e/0x73
[69673.105699]  [<ffffffff81052e69>] ? autoremove_wake_function+0x2a/0x2a
[69673.105731]  [<ffffffffa0199d18>] ? read_extent_buffer_pages+0x1fb/0x24b [btrfs]
[69673.105745]  [<ffffffffa0175112>] ? lock_page+0x11/0x11 [btrfs]
[69673.105758]  [<ffffffffa0176e69>] ? btree_read_extent_buffer_pages.constprop.110+0x5c/0xf3 [btrfs]
[69673.105771]  [<ffffffffa0177412>] ? read_tree_block+0x25/0x2d [btrfs]
[69673.105782]  [<ffffffffa0163087>] ? read_block_for_search.isra.32+0x2af/0x2e7 [btrfs]
[69673.105793]  [<ffffffffa01649c2>] ? btrfs_search_slot+0x48d/0x659 [btrfs]
[69673.105805]  [<ffffffffa0172d7d>] ? btrfs_lookup_csum+0x66/0x106 [btrfs]
[69673.105808]  [<ffffffff8136583e>] ? sub_preempt_count+0x83/0x94
[69673.105820]  [<ffffffffa0172fad>] ? __btrfs_lookup_bio_sums+0x190/0x30a [btrfs]
[69673.105834]  [<ffffffffa017d021>] ? btrfs_submit_bio_hook+0xa9/0x12a [btrfs]
[69673.105849]  [<ffffffffa0194af3>] ? submit_one_bio+0x82/0xb9 [btrfs]
[69673.105863]  [<ffffffffa01977c7>] ? submit_extent_page.isra.26+0x10a/0x1b2 [btrfs]
[69673.105876]  [<ffffffffa0196d91>] ? repair_io_failure+0x18e/0x18e [btrfs]
[69673.105889]  [<ffffffffa0197cf5>] ? __extent_read_full_page+0x424/0x4be [btrfs]
[69673.105902]  [<ffffffffa0196d91>] ? repair_io_failure+0x18e/0x18e [btrfs]
[69673.105916]  [<ffffffffa017eca4>] ? btrfs_writepage+0x4b/0x4b [btrfs]
[69673.105920]  [<ffffffff810c56b3>] ? __lru_cache_add+0x7b/0x8d
[69673.105934]  [<ffffffffa01988e8>] ? extent_readpages+0xaf/0xf3 [btrfs]
[69673.105947]  [<ffffffffa017eca4>] ? btrfs_writepage+0x4b/0x4b [btrfs]
[69673.105950]  [<ffffffff810c4640>] ? __do_page_cache_readahead+0x139/0x1de
[69673.105953]  [<ffffffff810c493d>] ? ra_submit+0x19/0x1d
[69673.105956]  [<ffffffff810bca23>] ? generic_file_aio_read+0x2b0/0x5d3
[69673.105960]  [<ffffffff81110627>] ? set_fd_set+0x23/0x26
[69673.105963]  [<ffffffff8110252a>] ? do_sync_read+0xab/0xe3
[69673.105967]  [<ffffffff81362b73>] ? _raw_spin_unlock_irqrestore+0x30/0x3e
[69673.105971]  [<ffffffff810595b4>] ? __wake_up+0x35/0x46
[69673.105973]  [<ffffffff81102c22>] ? vfs_read+0x9f/0xe6
[69673.105977]  [<ffffffff81103c2d>] ? fget_light+0x33/0x8d
[69673.105979]  [<ffffffff81102cae>] ? sys_read+0x45/0x6b
[69673.105983]  [<ffffffff8136723d>] ? system_call_fastpath+0x1a/0x1f
[69673.105986] mplayer         D ffff8802017ef040     0  8109    538 0x00000080
[69673.105990]  ffff8802017ef040 0000000000000086 ffff88012ff54700 ffff880121d77080
[69673.105993]  0000000000013680 ffff88013fc0dfd8 ffff88013fc0dfd8 ffff8802017ef040
[69673.105996]  0000000000000000 ffff88013fc0d890 ffff88021e5b5208 ffffffff8116761a
[69673.106000] Call Trace:
[69673.106003]  [<ffffffff8116761a>] ? wait_for_key_construction+0x59/0x59
[69673.106005]  [<ffffffff81167620>] ? key_wait_bit+0x6/0xa
[69673.106008]  [<ffffffff81361054>] ? __wait_on_bit+0x3e/0x71
[69673.106011]  [<ffffffff813610f6>] ? out_of_line_wait_on_bit+0x6f/0x78
[69673.106014]  [<ffffffff8116761a>] ? wait_for_key_construction+0x59/0x59
[69673.106017]  [<ffffffff81052e69>] ? autoremove_wake_function+0x2a/0x2a
[69673.106020]  [<ffffffff811675fb>] ? wait_for_key_construction+0x3a/0x59
[69673.106022]  [<ffffffff81167ec3>] ? request_key+0x4c/0x69
[69673.106025]  [<ffffffff810570aa>] ? override_creds+0x1f/0x32
[69673.106036]  [<ffffffffa070196f>] ? nfs_idmap_request_key+0xe6/0x18b [nfs]
[69673.106046]  [<ffffffffa0701b25>] ? nfs_idmap_lookup_id+0x53/0xca [nfs]
[69673.106056]  [<ffffffffa06f8389>] ? decode_getfattr_attrs+0x8de/0xcf4 [nfs]
[69673.106059]  [<ffffffff81061521>] ? set_next_entity+0x32/0x52
[69673.106069]  [<ffffffffa06fdcb0>] ? nfs4_xdr_dec_lookup+0x71/0x71 [nfs]
[69673.106078]  [<ffffffffa06fd665>] ? decode_getfattr_generic.constprop.83+0x7f/0xcd [nfs]
[69673.106087]  [<ffffffffa06fdcb0>] ? nfs4_xdr_dec_lookup+0x71/0x71 [nfs]
[69673.106096]  [<ffffffffa06fdcf9>] ? nfs4_xdr_dec_getattr+0x49/0x50 [nfs]
[69673.106104]  [<ffffffffa069be59>] ? rpc_make_runnable+0x6a/0x6a [sunrpc]
[69673.106111]  [<ffffffffa069d7f3>] ? rpcauth_unwrap_resp+0x9b/0xa8 [sunrpc]
[69673.106117]  [<ffffffffa069be59>] ? rpc_make_runnable+0x6a/0x6a [sunrpc]
[69673.106122]  [<ffffffffa0693f70>] ? call_decode+0x673/0x708 [sunrpc]
[69673.106128]  [<ffffffffa06938fd>] ? call_bc_transmit+0x109/0x109 [sunrpc]
[69673.106133]  [<ffffffffa06938fd>] ? call_bc_transmit+0x109/0x109 [sunrpc]
[69673.106139]  [<ffffffffa069c6a1>] ? __rpc_execute+0xbe/0x320 [sunrpc]
[69673.106141]  [<ffffffff81052c0e>] ? wake_up_bit+0xd/0x1e
[69673.106147]  [<ffffffffa06954e7>] ? rpc_run_task+0x77/0x7e [sunrpc]
[69673.106152]  [<ffffffffa06955dd>] ? rpc_call_sync+0x42/0x62 [sunrpc]
[69673.106162]  [<ffffffffa06ef1d4>] ? _nfs4_proc_getattr+0x8b/0x94 [nfs]
[69673.106172]  [<ffffffffa06f2629>] ? nfs4_proc_getattr+0x2b/0x48 [nfs]
[69673.106179]  [<ffffffffa06de0f5>] ? __nfs_revalidate_inode+0xab/0x194 [nfs]
[69673.106186]  [<ffffffffa06de21a>] ? nfs_revalidate_mapping+0x3c/0xf5 [nfs]
[69673.106193]  [<ffffffffa06db88d>] ? nfs_file_read+0x8d/0xce [nfs]
[69673.106196]  [<ffffffff8110252a>] ? do_sync_read+0xab/0xe3
[69673.106199]  [<ffffffff810f3bb4>] ? kfree+0x65/0x76
[69673.106204]  [<ffffffffa035f8d6>] ? snd_timer_user_read+0x10a/0x1b4 [snd_timer]
[69673.106207]  [<ffffffff81102c22>] ? vfs_read+0x9f/0xe6
[69673.106209]  [<ffffffff81103c2d>] ? fget_light+0x33/0x8d
[69673.106212]  [<ffffffff81102cae>] ? sys_read+0x45/0x6b
[69673.106215]  [<ffffffff8136723d>] ? system_call_fastpath+0x1a/0x1f

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.4.4/amd64 full interrupt hangs under big nfs copies
  2012-07-16 15:18                             ` Marc MERLIN
@ 2012-07-16 16:21                               ` Eric Dumazet
  2012-07-16 17:17                                 ` Marc MERLIN
  2013-02-19  4:05                                 ` 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out Marc MERLIN
  0 siblings, 2 replies; 72+ messages in thread
From: Eric Dumazet @ 2012-07-16 16:21 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

On Mon, 2012-07-16 at 08:18 -0700, Marc MERLIN wrote:
> On Mon, Jul 16, 2012 at 08:18:49AM +0200, Eric Dumazet wrote:
> > > My understanding is that user space calling drivers that shut off all
> > > interrupts for extended periods of time (as least I think so since my mouse
> > > cursor would not move), is still a kernel bug.
> > > 
> > > For what it's worth, copying 1GB of data in lots of small files does not
> > > cause problems, it seems that it's big files that cause a problem since they
> > > likely fill a buffer somewhere while interrupts are disabled.
> > > 
> > > Do you have an idea of how I can find out where my mc process is stuck in
> > > the kernel?
> > > Should I reproduce with specific sysrq output?
> > 
> > Just to clarify, you get this freeze when transferring a big file from a
> > remote NFS server to your PC, (aka a download), not the reverse way ?
>  
> No, it's atually when I'm 'uploading' from my laptop to my server.
> One interesting thing is that my server is running lvm2 with snapshots,
> which makes writes slower than my laptop can push data over the network, so
> it's definitely causing buffers to fill up.
> I just did a download test and got 4.5MB/s sustained without problems.

Hmm, nfs apparently is able to push lot of data, try to reduce
rsize/wsize to sane values, like 32K instead of 512K ?

gargamel:/mnt/dshelf2/ /net/gargamel/mnt/dshelf2 nfs4
rw,nosuid,nodev,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.205.7,local_lock=none,addr=192.168.205.3 0 0

You could trace svc_sock_setbufsize() and check how large is set
sk_sndbuf

(iwlwifi is unable to use sendpage anyway, since SG is not enabled)





^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.4.4/amd64 full interrupt hangs under big nfs copies
  2012-07-16 16:21                               ` Eric Dumazet
@ 2012-07-16 17:17                                 ` Marc MERLIN
  2013-02-19  4:05                                 ` 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out Marc MERLIN
  1 sibling, 0 replies; 72+ messages in thread
From: Marc MERLIN @ 2012-07-16 17:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

On Mon, Jul 16, 2012 at 06:21:57PM +0200, Eric Dumazet wrote:
> > No, it's atually when I'm 'uploading' from my laptop to my server.
> > One interesting thing is that my server is running lvm2 with snapshots,
> > which makes writes slower than my laptop can push data over the network, so
> > it's definitely causing buffers to fill up.
> > I just did a download test and got 4.5MB/s sustained without problems.
> 
> Hmm, nfs apparently is able to push lot of data, try to reduce
> rsize/wsize to sane values, like 32K instead of 512K ?
> 
> gargamel:/mnt/dshelf2/ /net/gargamel/mnt/dshelf2 nfs4
> rw,nosuid,nodev,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.205.7,local_lock=none,addr=192.168.205.3 0 0

Nice catch. That seems like an excessive default from autofs5 5.0.4-3.2+b1

So, it helped. I still got hangs, but this time they were VFS hangs. I
couldn't do anything filesystem related durign the 'hangs', but the
interrupts weren't hung anymore, so I could move my mouse cursor.

Having NFS hang all of VFS and local disk is obviously still a problem, but
at this point it may not be a networking (or wireless) related problem.

I'll attach the relevant logs during that attempt. Does that help?

Thanks,
Marc

[76903.011101] SysRq : Show Blocked State
[76903.011110]   task                        PC stack   pid father
[76903.011306] mc              D ffff88021e2d3680     0  9383   9270 0x00000080
[76903.011314]  ffff880111094100 0000000000000082 000000000000000e ffff880213549140
[76903.011322]  0000000000013680 ffff8800140e3fd8 ffff8800140e3fd8 ffff880111094100
[76903.011328]  ffff88021e5c5258 0000000000000000 ffff880111094100 ffff8800140e3e40
[76903.011335] Call Trace:
[76903.011362]  [<ffffffffa06dcdf2>] ? nfs_find_actor+0x66/0x66 [nfs]
[76903.011376]  [<ffffffffa06dce4d>] ? nfs_wait_bit_killable+0x5b/0x6e [nfs]
[76903.011384]  [<ffffffff81360f55>] ? __wait_on_bit_lock+0x3c/0x85
[76903.011391]  [<ffffffff810bb793>] ? filemap_fdatawait_range+0x11b/0x139
[76903.011397]  [<ffffffff8136100d>] ? out_of_line_wait_on_bit_lock+0x6f/0x78
[76903.011410]  [<ffffffffa06dcdf2>] ? nfs_find_actor+0x66/0x66 [nfs]
[76903.011417]  [<ffffffff81052e69>] ? autoremove_wake_function+0x2a/0x2a
[76903.011435]  [<ffffffffa06e8ca2>] ? nfs_commit_inode+0x66/0x27a [nfs]
[76903.011448]  [<ffffffffa06db56e>] ? nfs_file_fsync+0x95/0xf3 [nfs]
[76903.011455]  [<ffffffff811015a9>] ? filp_close+0x3b/0x6a
[76903.011461]  [<ffffffff8110165e>] ? sys_close+0x86/0xc7
[76903.011467]  [<ffffffff8136723d>] ? system_call_fastpath+0x1a/0x1f
[76903.011482] kworker/0:0     D ffff88021e213680     0 13850      2 0x00000080
[76903.011489]  ffff8801fac7d850 0000000000000046 ffff8802117cb848 ffff880140773750
[76903.011495]  0000000000013680 ffff88004c4e7fd8 ffff88004c4e7fd8 ffff8801fac7d850
[76903.011502]  ffff88021e5df9a0 0000000000000000 ffff8801fac7d850 ffffffffa069be59
[76903.011508] Call Trace:
[76903.011524]  [<ffffffffa069be59>] ? rpc_make_runnable+0x6a/0x6a [sunrpc]
[76903.011535]  [<ffffffffa069beb2>] ? rpc_wait_bit_killable+0x59/0x6c [sunrpc]
[76903.011541]  [<ffffffff81361054>] ? __wait_on_bit+0x3e/0x71
[76903.011547]  [<ffffffff81362b73>] ? _raw_spin_unlock_irqrestore+0x30/0x3e
[76903.011553]  [<ffffffff813610f6>] ? out_of_line_wait_on_bit+0x6f/0x78
[76903.011565]  [<ffffffffa069be59>] ? rpc_make_runnable+0x6a/0x6a [sunrpc]
[76903.011570]  [<ffffffff81052e69>] ? autoremove_wake_function+0x2a/0x2a
[76903.011587]  [<ffffffffa06e7bdf>] ? nfs_initiate_commit+0xf4/0x105 [nfs]
[76903.011604]  [<ffffffffa06e8e30>] ? nfs_commit_inode+0x1f4/0x27a [nfs]
[76903.011617]  [<ffffffffa06db97c>] ? nfs_release_page+0x56/0x73 [nfs]
[76903.011626]  [<ffffffff810ca356>] ? shrink_page_list+0x556/0x739
[76903.011635]  [<ffffffff8105dd51>] ? get_parent_ip+0x9/0x1b
[76903.011640]  [<ffffffff8136583e>] ? sub_preempt_count+0x83/0x94
[76903.011646]  [<ffffffff810c91eb>] ? update_isolated_counts.isra.44+0x148/0x16e
[76903.011653]  [<ffffffff810ca9a3>] ? shrink_inactive_list+0x2b1/0x446
[76903.011661]  [<ffffffff810cb182>] ? shrink_mem_cgroup_zone+0x371/0x480
[76903.011668]  [<ffffffff810cb2f3>] ? shrink_zone+0x62/0x9b
[76903.011675]  [<ffffffff810cb73c>] ? do_try_to_free_pages+0x1e4/0x434
[76903.011682]  [<ffffffff810cbc11>] ? try_to_free_pages+0xb3/0xf9
[76903.011688]  [<ffffffff8105931b>] ? should_resched+0x5/0x23
[76903.011695]  [<ffffffff810c24a2>] ? __alloc_pages_nodemask+0x4ef/0x7df
[76903.011702]  [<ffffffff8105dd51>] ? get_parent_ip+0x9/0x1b
[76903.011711]  [<ffffffff810ecf10>] ? alloc_pages_current+0xc7/0xe4
[76903.011723]  [<ffffffffa04ca247>] ? iwlagn_rx_allocate+0x97/0x24d [iwlwifi]
[76903.011734]  [<ffffffffa04ca81e>] ? iwlagn_rx_replenish+0x3a/0x3a [iwlwifi]
[76903.011744]  [<ffffffffa04ca7fc>] ? iwlagn_rx_replenish+0x18/0x3a [iwlwifi]
[76903.011750]  [<ffffffff8104ea7d>] ? process_one_work+0x16d/0x298
[76903.011757]  [<ffffffff8104f4d9>] ? worker_thread+0xc2/0x145
[76903.011763]  [<ffffffff8104f417>] ? manage_workers.isra.23+0x15b/0x15b
[76903.011768]  [<ffffffff81052788>] ? kthread+0x7d/0x85
[76903.011774]  [<ffffffff813686a4>] ? kernel_thread_helper+0x4/0x10
[76903.011780]  [<ffffffff8105270b>] ? kthread_freezable_should_stop+0x37/0x37
[76903.011786]  [<ffffffff813686a0>] ? gs_change+0x13/0x13
[76903.011797] Sched Debug Version: v0.10, 3.4.4-amd64-preempt-noide-20120410 #1

and

[76843.153742] 
[76873.080978] SysRq : Show Blocked State
[76873.080987]   task                        PC stack   pid father
[76873.081200] mc              D ffff88021e293680     0  9383   9270 0x00000080
[76873.081208]  ffff880111094100 0000000000000082 0000000000000001 ffff8802135107d0
[76873.081216]  0000000000013680 ffff8800140e3fd8 ffff8800140e3fd8 ffff880111094100
[76873.081222]  ffff88010c9033d0 ffff88021e293680 ffff880111094100 ffffffff810bb429
[76873.081229] Call Trace:
[76873.081241]  [<ffffffff810bb429>] ? __lock_page+0x66/0x66
[76873.081249]  [<ffffffff81362059>] ? io_schedule+0x55/0x6b
[76873.081254]  [<ffffffff810bb42f>] ? sleep_on_page+0x6/0xa
[76873.081260]  [<ffffffff81361054>] ? __wait_on_bit+0x3e/0x71
[76873.081265]  [<ffffffff810bb577>] ? wait_on_page_bit+0x6e/0x73
[76873.081272]  [<ffffffff81052e69>] ? autoremove_wake_function+0x2a/0x2a
[76873.081278]  [<ffffffff810bb6ec>] ? filemap_fdatawait_range+0x74/0x139
[76873.081285]  [<ffffffff810bc2e8>] ? filemap_write_and_wait_range+0x3b/0x4d
[76873.081308]  [<ffffffffa06db536>] ? nfs_file_fsync+0x5d/0xf3 [nfs]
[76873.081317]  [<ffffffff811015a9>] ? filp_close+0x3b/0x6a
[76873.081323]  [<ffffffff8110165e>] ? sys_close+0x86/0xc7
[76873.081330]  [<ffffffff8136723d>] ? system_call_fastpath+0x1a/0x1f
[76873.081346] kworker/0:0     D ffff88021e213680     0 13850      2 0x00000080
[76873.081352]  ffff8801fac7d850 0000000000000046 ffff880186753ce8 ffff880126d7f040
[76873.081358]  0000000000013680 ffff88004c4e7fd8 ffff88004c4e7fd8 ffff8801fac7d850
[76873.081365]  ffff8801c5ae1d70 ffff88021e213680 ffff8801fac7d850 ffffffff810bb429
[76873.081371] Call Trace:
[76873.081376]  [<ffffffff810bb429>] ? __lock_page+0x66/0x66
[76873.081381]  [<ffffffff81362059>] ? io_schedule+0x55/0x6b
[76873.081386]  [<ffffffff810bb42f>] ? sleep_on_page+0x6/0xa
[76873.081391]  [<ffffffff81361054>] ? __wait_on_bit+0x3e/0x71
[76873.081396]  [<ffffffff810bb577>] ? wait_on_page_bit+0x6e/0x73
[76873.081402]  [<ffffffff81052e69>] ? autoremove_wake_function+0x2a/0x2a
[76873.081411]  [<ffffffff810c9f66>] ? shrink_page_list+0x166/0x739
[76873.081420]  [<ffffffff8105dd51>] ? get_parent_ip+0x9/0x1b
[76873.081425]  [<ffffffff8136583e>] ? sub_preempt_count+0x83/0x94
[76873.081431]  [<ffffffff810c91eb>] ? update_isolated_counts.isra.44+0x148/0x16e
[76873.081438]  [<ffffffff810ca9a3>] ? shrink_inactive_list+0x2b1/0x446
[76873.081446]  [<ffffffff810cb182>] ? shrink_mem_cgroup_zone+0x371/0x480
[76873.081454]  [<ffffffff810cb2f3>] ? shrink_zone+0x62/0x9b
[76873.081460]  [<ffffffff810cb73c>] ? do_try_to_free_pages+0x1e4/0x434
[76873.081467]  [<ffffffff810cbc11>] ? try_to_free_pages+0xb3/0xf9
[76873.081473]  [<ffffffff8105931b>] ? should_resched+0x5/0x23
[76873.081481]  [<ffffffff810c24a2>] ? __alloc_pages_nodemask+0x4ef/0x7df
[76873.081487]  [<ffffffff8105dd51>] ? get_parent_ip+0x9/0x1b
[76873.081497]  [<ffffffff810ecf10>] ? alloc_pages_current+0xc7/0xe4
[76873.081510]  [<ffffffffa04ca247>] ? iwlagn_rx_allocate+0x97/0x24d [iwlwifi]
[76873.081521]  [<ffffffffa04ca81e>] ? iwlagn_rx_replenish+0x3a/0x3a [iwlwifi]
[76873.081530]  [<ffffffffa04ca7fc>] ? iwlagn_rx_replenish+0x18/0x3a [iwlwifi]
[76873.081538]  [<ffffffff8104ea7d>] ? process_one_work+0x16d/0x298
[76873.081545]  [<ffffffff8104f4d9>] ? worker_thread+0xc2/0x145
[76873.081551]  [<ffffffff8104f417>] ? manage_workers.isra.23+0x15b/0x15b
[76873.081556]  [<ffffffff81052788>] ? kthread+0x7d/0x85
[76873.081562]  [<ffffffff813686a4>] ? kernel_thread_helper+0x4/0x10
[76873.081568]  [<ffffffff8105270b>] ? kthread_freezable_should_stop+0x37/0x37
[76873.081574]  [<ffffffff813686a0>] ? gs_change+0x13/0x13
[76873.081585] 192.168.205.3-m D ffff88021e293680     0 14532      2 0x00000080
[76873.081590]  ffff880206d600c0 0000000000000046 ffff880186733e60 ffff88004b4230c0
[76873.081597]  0000000000013680 ffff880022305fd8 ffff880022305fd8 ffff880206d600c0
[76873.081603]  ffff88021e5bb778 0000000000000000 ffff880206d600c0 ffffffffa069be59
[76873.081609] Call Trace:
[76873.081625]  [<ffffffffa069be59>] ? rpc_make_runnable+0x6a/0x6a [sunrpc]
[76873.081637]  [<ffffffffa069beb2>] ? rpc_wait_bit_killable+0x59/0x6c [sunrpc]
[76873.081642]  [<ffffffff81361054>] ? __wait_on_bit+0x3e/0x71
[76873.081648]  [<ffffffff81362b73>] ? _raw_spin_unlock_irqrestore+0x30/0x3e
[76873.081654]  [<ffffffff813610f6>] ? out_of_line_wait_on_bit+0x6f/0x78
[76873.081665]  [<ffffffffa069be59>] ? rpc_make_runnable+0x6a/0x6a [sunrpc]
[76873.081671]  [<ffffffff81052e69>] ? autoremove_wake_function+0x2a/0x2a
[76873.081690]  [<ffffffffa06efb13>] ? nfs4_run_open_task+0x101/0x12e [nfs]
[76873.081709]  [<ffffffffa06f12fb>] ? nfs4_open_recover_helper+0xbd/0x13f [nfs]
[76873.081724]  [<ffffffffa06f13e1>] ? nfs4_open_recover+0x64/0x113 [nfs]
[76873.081740]  [<ffffffffa06f36a2>] ? nfs4_open_expired+0x69/0xc4 [nfs]
[76873.081761]  [<ffffffffa06ff5b8>] ? nfs4_do_reclaim+0x109/0x4a0 [nfs]
[76873.081779]  [<ffffffffa06fe7cb>] ? nfs4_state_clear_reclaim_reboot.part.7+0xf6/0x10a [nfs]
[76873.081797]  [<ffffffffa06ffcb2>] ? nfs4_run_state_manager+0x363/0x52e [nfs]
[76873.081814]  [<ffffffffa06ff94f>] ? nfs4_do_reclaim+0x4a0/0x4a0 [nfs]
[76873.081819]  [<ffffffff81052788>] ? kthread+0x7d/0x85
[76873.081825]  [<ffffffff813686a4>] ? kernel_thread_helper+0x4/0x10
[76873.081830]  [<ffffffff8105270b>] ? kthread_freezable_should_stop+0x37/0x37
[76873.081836]  [<ffffffff813686a0>] ? gs_change+0x13/0x13
[76873.081842] Sched Debug Version: v0.10, 3.4.4-amd64-preempt-noide-20120410 #1
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2012-07-16 16:21                               ` Eric Dumazet
  2012-07-16 17:17                                 ` Marc MERLIN
@ 2013-02-19  4:05                                 ` Marc MERLIN
  2013-02-19  5:17                                   ` Eric Dumazet
  1 sibling, 1 reply; 72+ messages in thread
From: Marc MERLIN @ 2013-02-19  4:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

On Mon, Jul 16, 2012 at 06:21:57PM +0200, Eric Dumazet wrote:
> > No, it's atually when I'm 'uploading' from my laptop to my server.
> > One interesting thing is that my server is running lvm2 with snapshots,
> > which makes writes slower than my laptop can push data over the network, so
> > it's definitely causing buffers to fill up.
> > I just did a download test and got 4.5MB/s sustained without problems.
> 
> Hmm, nfs apparently is able to push lot of data, try to reduce
> rsize/wsize to sane values, like 32K instead of 512K ?
> 
> gargamel:/mnt/dshelf2/ /net/gargamel/mnt/dshelf2 nfs4
> rw,nosuid,nodev,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.205.7,local_lock=none,addr=192.168.205.3 0 0
> 
> You could trace svc_sock_setbufsize() and check how large is set
> sk_sndbuf

My apologies, I totally dropped the ball on this.

So, the problem was still there in more recent kernels.

TL;DR:
- reducing nfs buffers removes the full hang
- iwlwifi has a problem where lack of pages causes the whoe machine to hang
- NFS copies out, even with buffers down to 32K is very wonky and cp does not
  return until over 2mn after the copy is actually finished.
  (I have a trace of what's hung in cp/nfs when this happens)


Details:

It's still pretty severe because whatever blocks doesn't just end up
blocking disk IO, but actually blocking interrupts altogether since my mouse
can't move for a minute or more until some buffer flushes.

The last trace I got during this (I can't do sysrq because I have a broken 
Lenovo T530 without a sysrq key, and typing doesn't really work when
interrupts aren't firing).

Not sure if it's useful. First chrome had an issue, and then iwlwifi

chrome: page allocation failure: order:1, mode:0x4020
Pid: 8730, comm: chrome Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
Call Trace:
 <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c
 [<ffffffff810d8cfd>] __alloc_pages_nodemask+0x66a/0x702
 [<ffffffff8108a948>] ? arch_local_irq_save+0x15/0x1b
 [<ffffffff811064af>] alloc_pages_current+0xcd/0xee
 [<ffffffffa039b579>] iwl_rx_allocate+0x8c/0x271 [iwlwifi]
 [<ffffffffa039c24e>] iwl_irq_tasklet+0x7e5/0x91c [iwlwifi]
 [<ffffffff8104805e>] tasklet_action+0x80/0xd2
 [<ffffffff81047c99>] __do_softirq+0xdf/0x1c5
 [<ffffffff814c1ed6>] ? _raw_spin_lock+0x1b/0x1f
 [<ffffffff810a7f37>] ? handle_irq_event+0x4d/0x62
 [<ffffffff814c7f5c>] call_softirq+0x1c/0x30
 [<ffffffff8101104e>] do_softirq+0x41/0x7f
 [<ffffffff81047e52>] irq_exit+0x3f/0xa7
 [<ffffffff81010d40>] do_IRQ+0x88/0x9f
 [<ffffffff814c246d>] common_interrupt+0x6d/0x6d
 <EOI> Mem-Info:

iwlwifi 0000:03:00.0: Failed to alloc_pages with GFP_ATOMIC.Only 1 free buffers remaining.
swapper/0: page allocation failure: order:1, mode:0x4020
Pid: 0, comm: swapper/0 Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
Call Trace:
 <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c
 [<ffffffff810d8cfd>] __alloc_pages_nodemask+0x66a/0x702
 [<ffffffff8108a948>] ? arch_local_irq_save+0x15/0x1b
 [<ffffffff811064af>] alloc_pages_current+0xcd/0xee
 [<ffffffffa039b579>] iwl_rx_allocate+0x8c/0x271 [iwlwifi]
 [<ffffffffa039c24e>] iwl_irq_tasklet+0x7e5/0x91c [iwlwifi]
 [<ffffffff810af981>] ? rcu_irq_exit+0x6f/0x74
 [<ffffffff8104805e>] tasklet_action+0x80/0xd2
 [<ffffffff81047c99>] __do_softirq+0xdf/0x1c5
 [<ffffffff814c1ed6>] ? _raw_spin_lock+0x1b/0x1f
 [<ffffffff810a7f37>] ? handle_irq_event+0x4d/0x62
 [<ffffffff814c7f5c>] call_softirq+0x1c/0x30
 [<ffffffff8101104e>] do_softirq+0x41/0x7f
 [<ffffffff81047e52>] irq_exit+0x3f/0xa7
 [<ffffffff81010d40>] do_IRQ+0x88/0x9f
 [<ffffffff814c246d>] common_interrupt+0x6d/0x6d
 <EOI>  [<ffffffff810151f9>] ? paravirt_read_tsc+0x9/0xd
 [<ffffffff812daf10>] ? intel_idle+0xe6/0x112
 [<ffffffff812daeef>] ? intel_idle+0xc5/0x112
 [<ffffffff813c6440>] cpuidle_enter+0x12/0x14
 [<ffffffff813c68d9>] cpuidle_enter_state+0x10/0x3a
 [<ffffffff813c69fb>] cpuidle_idle_call+0xf8/0x180
 [<ffffffff81016c21>] cpu_idle+0x9b/0xf9
 [<ffffffff8149ec77>] rest_init+0x7b/0x7f
 [<ffffffff81ac4ba4>] start_kernel+0x3c8/0x3d5
 [<ffffffff81ac45e4>] ? repair_env_string+0x56/0x56
 [<ffffffff81ac42d3>] x86_64_start_reservations+0xae/0xb2
 [<ffffffff81ac4120>] ? early_idt_handlers+0x120/0x120
 [<ffffffff81ac43d9>] x86_64_start_kernel+0x102/0x111
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:  21
CPU    1: hi:  186, btch:  31 usd:  43
CPU    2: hi:  186, btch:  31 usd:  59
CPU    3: hi:  186, btch:  31 usd:  35
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 150
CPU    1: hi:  186, btch:  31 usd:  29
CPU    2: hi:  186, btch:  31 usd:  61
CPU    3: hi:  186, btch:  31 usd:  32
active_anon:1147756 inactive_anon:211701 isolated_anon:0
 active_file:89633 inactive_file:132474 isolated_file:15
 unevictable:2444 dirty:28383 writeback:32015 unstable:5873
 free:29900 slab_reclaimable:22370 slab_unreclaimable:23443
 mapped:315414 shmem:80562 pagetables:32697 bounce:0
 free_cma:0
Node 0 DMA free:15900kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15676kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3257 7777 7777
Node 0 DMA32 free:54852kB min:28252kB low:35312kB high:42376kB active_anon:2187940kB inactive_anon:437600kB active_file:131500kB inactive_file:217816kB unevictable:2556kB isolated(anon):0kB isolated(file):0kB present:3335900kB mlocked:2556kB dirty:43628kB writeback:59292kB mapped:315320kB shmem:150940kB slab_reclaimable:21884kB slab_unreclaimable:29608kB kernel_stack:3648kB pagetables:36140kB unstable:5060kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:230 all_unreclaimable? no
lowmem_reserve[]: 0 0 4519 4519
Node 0 Normal free:48848kB min:39196kB low:48992kB high:58792kB active_anon:2403084kB inactive_anon:409204kB active_file:227328kB inactive_file:311920kB unevictable:7220kB isolated(anon):0kB isolated(file):60kB present:4627820kB mlocked:7220kB dirty:69904kB writeback:68768kB mapped:946336kB shmem:171308kB slab_reclaimable:67596kB slab_unreclaimable:64164kB kernel_stack:5704kB pagetables:94648kB unstable:18432kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:130 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 1*4kB 1*8kB 1*16kB 0*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15900kB
Node 0 DMA32: 12419*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 54652kB
Node 0 Normal: 11218*4kB 38*8kB 0*16kB 0*32kB 2*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 49016kB
440338 total pagecache pages
136996 pages in swap cache
Swap cache stats: add 3377120, delete 3240124, find 1032307/1258857
Free swap  = 6034228kB
Total swap = 10485756kB
2057712 pages RAM
341095 pages reserved
1654166 pages shared
1340354 pages non-shared
iwlwifi 0000:03:00.0: Failed to alloc_pages with GFP_ATOMIC.Only 1 free buffers remaining.


The good news is that the system unlocks itself, and I eventually get to a
root shell where I can killall -9 cp, which kills the copy from local disk
to remote nfs server after usually a minute of wait for more buffers to
flush.

Per your recommendation, I changed my NFS buffers and /etc/auto.master
now says:
/net	-hosts nointr,soft,rsize=32768,wsize=32768

This indeed does make things better: the copy still takes way way too long,
but the mouse or system doesn't hang anymore. 

I can however ls the file on the remote server a full 3mn before cp returns
completion on my laptop.
There definitely seems to be something wrong with NFS and buffering, and on my
laptop, when that happens, it seems to hang the intel wireless driver, maybe
because it's failing to allocate pages, and it just sits there, somehow with
interrupts turned off? (but I thought drivers don't turn off interrupts anymore
so I'm not sure what's really going on and why my mouse pointer and screen updates
just stop for a minute or more).

To be honest, the situation is bearable enough now with 32K buffers that I can catch
the slow copy, kill it, and use scp instead, but if you know if appropriate folks
who would want to know about this, let me know.

Just to check, I went back to a copy to remote nfs server using ethernet,
and the file was again copied 2mn before cp returned as complete. There
were however no hangs or problems with the copy.

ps wchan showed:
23267 merlin   cp -i grand.avia /net/garga rpc_wait_bit_killable

SysRq : Show Blocked State
  task                        PC stack   pid father
cp              D ffff8800ae554530     0 23267  11009 0x00000080
 ffff8800ae5a3cc8 0000000000000086 ffff88021e293c80 ffff8800ae554140
 ffff8800ae554140 ffff8800ae5a3fd8 ffff8800ae5a3fd8 0000000000013c80
 ffff88021551e380 ffff8800ae554140 ffff8800ae5a3cb8 ffff8800ae5a3d70
Call Trace:
 [<ffffffffa06f8676>] ? nfs_setattr_update_inode+0xdb/0xdb [nfs]
 [<ffffffff814c12b6>] schedule+0x5f/0x61
 [<ffffffffa06f86d4>] nfs_wait_bit_killable+0x5e/0x78 [nfs]
 [<ffffffff814c0059>] __wait_on_bit_lock+0x41/0x8a
 [<ffffffff814c010f>] out_of_line_wait_on_bit_lock+0x6d/0x78
 [<ffffffffa06f8676>] ? nfs_setattr_update_inode+0xdb/0xdb [nfs]
 [<ffffffff8105d4f0>] ? autoremove_wake_function+0x32/0x32
 [<ffffffffa0702699>] nfs_commit_inode+0x66/0x108 [nfs]
 [<ffffffffa06f6c95>] nfs_file_fsync_commit+0x74/0xb1 [nfs]
 [<ffffffffa099306c>] nfs4_file_fsync+0x6c/0xa3 [nfsv4]
 [<ffffffff81139f1e>] vfs_fsync_range+0x1e/0x20
 [<ffffffff81139f37>] vfs_fsync+0x17/0x19
 [<ffffffffa06f737d>] nfs_file_flush+0x95/0x9a [nfs]
 [<ffffffff81114bdf>] filp_close+0x42/0x75
 [<ffffffff8112b9b3>] __close_fd+0x7c/0x94
 [<ffffffff81114799>] sys_close+0x1b/0x45
 [<ffffffff814c6bed>] system_call_fastpath+0x1a/0x1f

Mount is like this:
gargamel:/mnt/dshelf1 /net/gargamel/mnt/dshelf1 nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=32768,wsize=32768,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.205.7,local_lock=none,addr=192.168.205.3 0 0

Can you comment on whether this should be a bug filed with the intel wireless folks?
As for NFS, is it known to be as probablematic as I'm seeing it, or is it just me?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-19  4:05                                 ` 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out Marc MERLIN
@ 2013-02-19  5:17                                   ` Eric Dumazet
  2013-02-19  5:26                                     ` Marc MERLIN
  2013-02-19 10:03                                     ` Johannes Berg
  0 siblings, 2 replies; 72+ messages in thread
From: Eric Dumazet @ 2013-02-19  5:17 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

On Mon, 2013-02-18 at 20:05 -0800, Marc MERLIN wrote:
> On Mon, Jul 16, 2012 at 06:21:57PM +0200, Eric Dumazet wrote:
> > > No, it's atually when I'm 'uploading' from my laptop to my server.
> > > One interesting thing is that my server is running lvm2 with snapshots,
> > > which makes writes slower than my laptop can push data over the network, so
> > > it's definitely causing buffers to fill up.
> > > I just did a download test and got 4.5MB/s sustained without problems.
> > 
> > Hmm, nfs apparently is able to push lot of data, try to reduce
> > rsize/wsize to sane values, like 32K instead of 512K ?
> > 
> > gargamel:/mnt/dshelf2/ /net/gargamel/mnt/dshelf2 nfs4
> > rw,nosuid,nodev,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.205.7,local_lock=none,addr=192.168.205.3 0 0
> > 
> > You could trace svc_sock_setbufsize() and check how large is set
> > sk_sndbuf
> 
> My apologies, I totally dropped the ball on this.
> 
> So, the problem was still there in more recent kernels.
> 
> TL;DR:
> - reducing nfs buffers removes the full hang
> - iwlwifi has a problem where lack of pages causes the whoe machine to hang
> - NFS copies out, even with buffers down to 32K is very wonky and cp does not
>   return until over 2mn after the copy is actually finished.
>   (I have a trace of what's hung in cp/nfs when this happens)
> 
> 
> Details:
> 
> It's still pretty severe because whatever blocks doesn't just end up
> blocking disk IO, but actually blocking interrupts altogether since my mouse
> can't move for a minute or more until some buffer flushes.
> 
> The last trace I got during this (I can't do sysrq because I have a broken 
> Lenovo T530 without a sysrq key, and typing doesn't really work when
> interrupts aren't firing).
> 
> Not sure if it's useful. First chrome had an issue, and then iwlwifi
> 
> chrome: page allocation failure: order:1, mode:0x4020
> Pid: 8730, comm: chrome Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
> Call Trace:
>  <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c
>  [<ffffffff810d8cfd>] __alloc_pages_nodemask+0x66a/0x702
>  [<ffffffff8108a948>] ? arch_local_irq_save+0x15/0x1b
>  [<ffffffff811064af>] alloc_pages_current+0xcd/0xee
>  [<ffffffffa039b579>] iwl_rx_allocate+0x8c/0x271 [iwlwifi]
>  [<ffffffffa039c24e>] iwl_irq_tasklet+0x7e5/0x91c [iwlwifi]
>  [<ffffffff8104805e>] tasklet_action+0x80/0xd2
>  [<ffffffff81047c99>] __do_softirq+0xdf/0x1c5
>  [<ffffffff814c1ed6>] ? _raw_spin_lock+0x1b/0x1f
>  [<ffffffff810a7f37>] ? handle_irq_event+0x4d/0x62
>  [<ffffffff814c7f5c>] call_softirq+0x1c/0x30
>  [<ffffffff8101104e>] do_softirq+0x41/0x7f
>  [<ffffffff81047e52>] irq_exit+0x3f/0xa7
>  [<ffffffff81010d40>] do_IRQ+0x88/0x9f
>  [<ffffffff814c246d>] common_interrupt+0x6d/0x6d
>  <EOI> Mem-Info:

You could try to load iwlwifi with amsdu_size_8K set to 0 (disable)

It should hopefully use order-0 pages

Some drivers cant fallback to low order page allocations.

mlx4 is another example (it uses order-2 pages )




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-19  5:17                                   ` Eric Dumazet
@ 2013-02-19  5:26                                     ` Marc MERLIN
  2013-02-19 10:03                                     ` Johannes Berg
  1 sibling, 0 replies; 72+ messages in thread
From: Marc MERLIN @ 2013-02-19  5:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Larry.Finger, bhutchings, linux-wireless, netdev

On Mon, Feb 18, 2013 at 09:17:13PM -0800, Eric Dumazet wrote:
> > chrome: page allocation failure: order:1, mode:0x4020
> > Pid: 8730, comm: chrome Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
> > Call Trace:
> >  <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c
> >  [<ffffffff810d8cfd>] __alloc_pages_nodemask+0x66a/0x702
> >  [<ffffffff8108a948>] ? arch_local_irq_save+0x15/0x1b
> >  [<ffffffff811064af>] alloc_pages_current+0xcd/0xee
> >  [<ffffffffa039b579>] iwl_rx_allocate+0x8c/0x271 [iwlwifi]
> >  [<ffffffffa039c24e>] iwl_irq_tasklet+0x7e5/0x91c [iwlwifi]
> >  [<ffffffff8104805e>] tasklet_action+0x80/0xd2
> >  [<ffffffff81047c99>] __do_softirq+0xdf/0x1c5
> >  [<ffffffff814c1ed6>] ? _raw_spin_lock+0x1b/0x1f
> >  [<ffffffff810a7f37>] ? handle_irq_event+0x4d/0x62
> >  [<ffffffff814c7f5c>] call_softirq+0x1c/0x30
> >  [<ffffffff8101104e>] do_softirq+0x41/0x7f
> >  [<ffffffff81047e52>] irq_exit+0x3f/0xa7
> >  [<ffffffff81010d40>] do_IRQ+0x88/0x9f
> >  [<ffffffff814c246d>] common_interrupt+0x6d/0x6d
> >  <EOI> Mem-Info:
> 
> You could try to load iwlwifi with amsdu_size_8K set to 0 (disable)
> 
> It should hopefully use order-0 pages
> 
> Some drivers cant fallback to low order page allocations.
> 
> mlx4 is another example (it uses order-2 pages )

Well, now that I changed NFS to 32K buffering, I won't have the problem
anyway, so the problem is effectively fixed enough for me.
But because the default behaviour is so bad, I wanted to help fix the
defaults so that it doesn't happen to others if possible.

I haven't been following the NFS world recently, so I don't know if the code
is known broken beyond fixing, or if I did find an unusual problem that is
worth looking into.

If somehow I'm the only one, then let's not worry about it :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-19  5:17                                   ` Eric Dumazet
  2013-02-19  5:26                                     ` Marc MERLIN
@ 2013-02-19 10:03                                     ` Johannes Berg
  2013-02-19 16:18                                       ` Marc MERLIN
  2013-02-19 16:21                                       ` Eric Dumazet
  1 sibling, 2 replies; 72+ messages in thread
From: Johannes Berg @ 2013-02-19 10:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Mon, 2013-02-18 at 21:17 -0800, Eric Dumazet wrote:

> > chrome: page allocation failure: order:1, mode:0x4020
> > Pid: 8730, comm: chrome Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
> > Call Trace:
> >  <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c

> You could try to load iwlwifi with amsdu_size_8K set to 0 (disable)
> 
> It should hopefully use order-0 pages

It will, do that then, unfortunately it can't switch at runtime because
it advertised this support to the access point or clients.

johannes


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-19 10:03                                     ` Johannes Berg
@ 2013-02-19 16:18                                       ` Marc MERLIN
  2013-02-19 16:36                                         ` Eric Dumazet
  2013-02-19 16:21                                       ` Eric Dumazet
  1 sibling, 1 reply; 72+ messages in thread
From: Marc MERLIN @ 2013-02-19 16:18 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Eric Dumazet, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Tue, Feb 19, 2013 at 11:03:41AM +0100, Johannes Berg wrote:
> On Mon, 2013-02-18 at 21:17 -0800, Eric Dumazet wrote:
> 
> > > chrome: page allocation failure: order:1, mode:0x4020
> > > Pid: 8730, comm: chrome Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
> > > Call Trace:
> > >  <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c
> 
> > You could try to load iwlwifi with amsdu_size_8K set to 0 (disable)
> > 
> > It should hopefully use order-0 pages
> 
> It will, do that then, unfortunately it can't switch at runtime because
> it advertised this support to the access point or clients.

Forgive the newbie question: would the inability to get pages cause
interrupts to be disabled or at least something that would stop the mouse
cursor and the keyboard from working?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-19 10:03                                     ` Johannes Berg
  2013-02-19 16:18                                       ` Marc MERLIN
@ 2013-02-19 16:21                                       ` Eric Dumazet
  2013-02-20  9:12                                         ` Johannes Berg
  1 sibling, 1 reply; 72+ messages in thread
From: Eric Dumazet @ 2013-02-19 16:21 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Tue, 2013-02-19 at 11:03 +0100, Johannes Berg wrote:
> On Mon, 2013-02-18 at 21:17 -0800, Eric Dumazet wrote:
> 
> > > chrome: page allocation failure: order:1, mode:0x4020
> > > Pid: 8730, comm: chrome Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
> > > Call Trace:
> > >  <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c
> 
> > You could try to load iwlwifi with amsdu_size_8K set to 0 (disable)
> > 
> > It should hopefully use order-0 pages
> 
> It will, do that then, unfortunately it can't switch at runtime because
> it advertised this support to the access point or clients.

What are the drawbacks of setting amsdu_size_8K to 0 by default ?

Alternative would be :

- As long as we cant allocate order-1 pages (to refill the RX ring),
copy incoming frames into order-0 pages, so that order-1 pages are still
available for the NIC.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-19 16:18                                       ` Marc MERLIN
@ 2013-02-19 16:36                                         ` Eric Dumazet
  0 siblings, 0 replies; 72+ messages in thread
From: Eric Dumazet @ 2013-02-19 16:36 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: Johannes Berg, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Tue, 2013-02-19 at 08:18 -0800, Marc MERLIN wrote:

> 
> Forgive the newbie question: would the inability to get pages cause
> interrupts to be disabled or at least something that would stop the mouse
> cursor and the keyboard from working?
> 

Maybe these things also need order-1 pages to work ;)

Any kmem_cache using order-1 or order-2 pages can not allocate new slabs

For example, TCP sockets can not be allocated anymore, and fork() might
not work anymore (kernel stacks need order-1 pages)

I dont remember your case, but do you use CONFIG_COMPACTION=y ?




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-19 16:21                                       ` Eric Dumazet
@ 2013-02-20  9:12                                         ` Johannes Berg
  2013-02-20  9:15                                           ` Johannes Berg
  0 siblings, 1 reply; 72+ messages in thread
From: Johannes Berg @ 2013-02-20  9:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Tue, 2013-02-19 at 08:21 -0800, Eric Dumazet wrote:
> On Tue, 2013-02-19 at 11:03 +0100, Johannes Berg wrote:
> > On Mon, 2013-02-18 at 21:17 -0800, Eric Dumazet wrote:
> > 
> > > > chrome: page allocation failure: order:1, mode:0x4020
> > > > Pid: 8730, comm: chrome Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
> > > > Call Trace:
> > > >  <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c
> > 
> > > You could try to load iwlwifi with amsdu_size_8K set to 0 (disable)
> > > 
> > > It should hopefully use order-0 pages
> > 
> > It will, do that then, unfortunately it can't switch at runtime because
> > it advertised this support to the access point or clients.
> 
> What are the drawbacks of setting amsdu_size_8K to 0 by default ?

We're discussing this now, the only downside would be that we couldn't
receive 8k A-MSDUs. Thing is, practically nobody uses A-MSDU anyway, and
even when I suspect the difference between 4k and 8k won't be huge.

johannes


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20  9:12                                         ` Johannes Berg
@ 2013-02-20  9:15                                           ` Johannes Berg
  2013-02-20 15:11                                             ` Eric Dumazet
  2013-02-23  6:14                                             ` Marc MERLIN
  0 siblings, 2 replies; 72+ messages in thread
From: Johannes Berg @ 2013-02-20  9:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 10:12 +0100, Johannes Berg wrote:
> On Tue, 2013-02-19 at 08:21 -0800, Eric Dumazet wrote:
> > On Tue, 2013-02-19 at 11:03 +0100, Johannes Berg wrote:
> > > On Mon, 2013-02-18 at 21:17 -0800, Eric Dumazet wrote:
> > > 
> > > > > chrome: page allocation failure: order:1, mode:0x4020
> > > > > Pid: 8730, comm: chrome Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
> > > > > Call Trace:
> > > > >  <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c
> > > 
> > > > You could try to load iwlwifi with amsdu_size_8K set to 0 (disable)
> > > > 
> > > > It should hopefully use order-0 pages
> > > 
> > > It will, do that then, unfortunately it can't switch at runtime because
> > > it advertised this support to the access point or clients.
> > 
> > What are the drawbacks of setting amsdu_size_8K to 0 by default ?
> 
> We're discussing this now, the only downside would be that we couldn't
> receive 8k A-MSDUs. Thing is, practically nobody uses A-MSDU anyway, and
> even when I suspect the difference between 4k and 8k won't be huge.

OTOH, this affects the protocol, and when you really can't allocate any
order-1 pages you pointed out yourself that many other things also won't
work, so I'm not really sure it makes a big difference if we change the
driver?

johannes


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20  9:15                                           ` Johannes Berg
@ 2013-02-20 15:11                                             ` Eric Dumazet
  2013-02-20 16:20                                                 ` Johannes Berg
  2013-02-20 20:09                                               ` Johannes Berg
  2013-02-23  6:14                                             ` Marc MERLIN
  1 sibling, 2 replies; 72+ messages in thread
From: Eric Dumazet @ 2013-02-20 15:11 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 10:15 +0100, Johannes Berg wrote:

> OTOH, this affects the protocol, and when you really can't allocate any
> order-1 pages you pointed out yourself that many other things also won't
> work, so I'm not really sure it makes a big difference if we change the
> driver?

It will make a huge difference, even on non pressure mode, as TCP
receive window will grow twice faster.

Unless wifi speed reaches 10Gbps, following patch should do the trick


diff --git a/drivers/net/wireless/iwlwifi/dvm/rx.c b/drivers/net/wireless/iwlwifi/dvm/rx.c
index a4eed20..77a3ee3 100644
--- a/drivers/net/wireless/iwlwifi/dvm/rx.c
+++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
@@ -750,7 +750,12 @@ static void iwlagn_pass_packet_to_mac80211(struct iwl_priv *priv,
 	/* Dont use dev_alloc_skb(), we'll have enough headroom once
 	 * ieee80211_hdr pulled.
 	 */
-	skb = alloc_skb(128, GFP_ATOMIC);
+	fraglen = 128;
+	/* if we use order-1 pages, copy to get better TCP performance */
+	if (rxb->truesize > PAGE_SIZE)
+		fraglen = max_t(unsigned, fraglen, len);
+
+	skb = alloc_skb(fraglen, GFP_ATOMIC);
 	if (!skb) {
 		IWL_ERR(priv, "alloc_skb failed\n");
 		return;



^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20 15:11                                             ` Eric Dumazet
@ 2013-02-20 16:20                                                 ` Johannes Berg
  2013-02-20 20:09                                               ` Johannes Berg
  1 sibling, 0 replies; 72+ messages in thread
From: Johannes Berg @ 2013-02-20 16:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 07:11 -0800, Eric Dumazet wrote:
> On Wed, 2013-02-20 at 10:15 +0100, Johannes Berg wrote:
> 
> > OTOH, this affects the protocol, and when you really can't allocate any
> > order-1 pages you pointed out yourself that many other things also won't
> > work, so I'm not really sure it makes a big difference if we change the
> > driver?
> 
> It will make a huge difference, even on non pressure mode, as TCP
> receive window will grow twice faster.

Hmm, why does that depend on the allocation size?

> Unless wifi speed reaches 10Gbps, following patch should do the trick
> 
> 
> diff --git a/drivers/net/wireless/iwlwifi/dvm/rx.c b/drivers/net/wireless/iwlwifi/dvm/rx.c
> index a4eed20..77a3ee3 100644
> --- a/drivers/net/wireless/iwlwifi/dvm/rx.c
> +++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
> @@ -750,7 +750,12 @@ static void iwlagn_pass_packet_to_mac80211(struct iwl_priv *priv,
>  	/* Dont use dev_alloc_skb(), we'll have enough headroom once
>  	 * ieee80211_hdr pulled.
>  	 */
> -	skb = alloc_skb(128, GFP_ATOMIC);
> +	fraglen = 128;
> +	/* if we use order-1 pages, copy to get better TCP performance */
> +	if (rxb->truesize > PAGE_SIZE)
> +		fraglen = max_t(unsigned, fraglen, len);
> +
> +	skb = alloc_skb(fraglen, GFP_ATOMIC);

Hmm, I don't quite understand -- that's not doing any copy?

FWIW if you do the copy you should not "steal" the pages, then they'd be
recycled in the RX ring right away.

johannes


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
@ 2013-02-20 16:20                                                 ` Johannes Berg
  0 siblings, 0 replies; 72+ messages in thread
From: Johannes Berg @ 2013-02-20 16:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc MERLIN, David Miller, Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ,
	bhutchings-s/n/eUQHGBpZroRs9YW3xA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

On Wed, 2013-02-20 at 07:11 -0800, Eric Dumazet wrote:
> On Wed, 2013-02-20 at 10:15 +0100, Johannes Berg wrote:
> 
> > OTOH, this affects the protocol, and when you really can't allocate any
> > order-1 pages you pointed out yourself that many other things also won't
> > work, so I'm not really sure it makes a big difference if we change the
> > driver?
> 
> It will make a huge difference, even on non pressure mode, as TCP
> receive window will grow twice faster.

Hmm, why does that depend on the allocation size?

> Unless wifi speed reaches 10Gbps, following patch should do the trick
> 
> 
> diff --git a/drivers/net/wireless/iwlwifi/dvm/rx.c b/drivers/net/wireless/iwlwifi/dvm/rx.c
> index a4eed20..77a3ee3 100644
> --- a/drivers/net/wireless/iwlwifi/dvm/rx.c
> +++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
> @@ -750,7 +750,12 @@ static void iwlagn_pass_packet_to_mac80211(struct iwl_priv *priv,
>  	/* Dont use dev_alloc_skb(), we'll have enough headroom once
>  	 * ieee80211_hdr pulled.
>  	 */
> -	skb = alloc_skb(128, GFP_ATOMIC);
> +	fraglen = 128;
> +	/* if we use order-1 pages, copy to get better TCP performance */
> +	if (rxb->truesize > PAGE_SIZE)
> +		fraglen = max_t(unsigned, fraglen, len);
> +
> +	skb = alloc_skb(fraglen, GFP_ATOMIC);

Hmm, I don't quite understand -- that's not doing any copy?

FWIW if you do the copy you should not "steal" the pages, then they'd be
recycled in the RX ring right away.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
@ 2013-02-20 16:55                                                   ` Eric Dumazet
  0 siblings, 0 replies; 72+ messages in thread
From: Eric Dumazet @ 2013-02-20 16:55 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 17:20 +0100, Johannes Berg wrote:
> On Wed, 2013-02-20 at 07:11 -0800, Eric Dumazet wrote:
> > On Wed, 2013-02-20 at 10:15 +0100, Johannes Berg wrote:
> > 
> > > OTOH, this affects the protocol, and when you really can't allocate any
> > > order-1 pages you pointed out yourself that many other things also won't
> > > work, so I'm not really sure it makes a big difference if we change the
> > > driver?
> > 
> > It will make a huge difference, even on non pressure mode, as TCP
> > receive window will grow twice faster.
> 
> Hmm, why does that depend on the allocation size?

I guess you missed all the patches about skb->truesize on netdev

> 
> > Unless wifi speed reaches 10Gbps, following patch should do the trick
> > 
> > 
> > diff --git a/drivers/net/wireless/iwlwifi/dvm/rx.c b/drivers/net/wireless/iwlwifi/dvm/rx.c
> > index a4eed20..77a3ee3 100644
> > --- a/drivers/net/wireless/iwlwifi/dvm/rx.c
> > +++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
> > @@ -750,7 +750,12 @@ static void iwlagn_pass_packet_to_mac80211(struct iwl_priv *priv,
> >  	/* Dont use dev_alloc_skb(), we'll have enough headroom once
> >  	 * ieee80211_hdr pulled.
> >  	 */
> > -	skb = alloc_skb(128, GFP_ATOMIC);
> > +	fraglen = 128;
> > +	/* if we use order-1 pages, copy to get better TCP performance */
> > +	if (rxb->truesize > PAGE_SIZE)
> > +		fraglen = max_t(unsigned, fraglen, len);
> > +
> > +	skb = alloc_skb(fraglen, GFP_ATOMIC);
> 
> Hmm, I don't quite understand -- that's not doing any copy?
> 
> FWIW if you do the copy you should not "steal" the pages, then they'd be
> recycled in the RX ring right away.

Code should just works, please read the following lines in the same
function....

        /* If frame is small enough to fit in skb->head, pull it completely.
         * If not, only pull ieee80211_hdr so that splice() or TCP coalesce
         * are more efficient.
         */
        hdrlen = (len <= skb_tailroom(skb)) ? len : sizeof(*hdr);

        memcpy(skb_put(skb, hdrlen), hdr, hdrlen);
        fraglen = len - hdrlen;

        if (fraglen) {
                int offset = (void *)hdr + hdrlen -
                             rxb_addr(rxb) + rxb_offset(rxb);

                skb_add_rx_frag(skb, 0, rxb_steal_page(rxb), offset,
                                fraglen, rxb->truesize);
        }




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
@ 2013-02-20 16:55                                                   ` Eric Dumazet
  0 siblings, 0 replies; 72+ messages in thread
From: Eric Dumazet @ 2013-02-20 16:55 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Marc MERLIN, David Miller, Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ,
	bhutchings-s/n/eUQHGBpZroRs9YW3xA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

On Wed, 2013-02-20 at 17:20 +0100, Johannes Berg wrote:
> On Wed, 2013-02-20 at 07:11 -0800, Eric Dumazet wrote:
> > On Wed, 2013-02-20 at 10:15 +0100, Johannes Berg wrote:
> > 
> > > OTOH, this affects the protocol, and when you really can't allocate any
> > > order-1 pages you pointed out yourself that many other things also won't
> > > work, so I'm not really sure it makes a big difference if we change the
> > > driver?
> > 
> > It will make a huge difference, even on non pressure mode, as TCP
> > receive window will grow twice faster.
> 
> Hmm, why does that depend on the allocation size?

I guess you missed all the patches about skb->truesize on netdev

> 
> > Unless wifi speed reaches 10Gbps, following patch should do the trick
> > 
> > 
> > diff --git a/drivers/net/wireless/iwlwifi/dvm/rx.c b/drivers/net/wireless/iwlwifi/dvm/rx.c
> > index a4eed20..77a3ee3 100644
> > --- a/drivers/net/wireless/iwlwifi/dvm/rx.c
> > +++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
> > @@ -750,7 +750,12 @@ static void iwlagn_pass_packet_to_mac80211(struct iwl_priv *priv,
> >  	/* Dont use dev_alloc_skb(), we'll have enough headroom once
> >  	 * ieee80211_hdr pulled.
> >  	 */
> > -	skb = alloc_skb(128, GFP_ATOMIC);
> > +	fraglen = 128;
> > +	/* if we use order-1 pages, copy to get better TCP performance */
> > +	if (rxb->truesize > PAGE_SIZE)
> > +		fraglen = max_t(unsigned, fraglen, len);
> > +
> > +	skb = alloc_skb(fraglen, GFP_ATOMIC);
> 
> Hmm, I don't quite understand -- that's not doing any copy?
> 
> FWIW if you do the copy you should not "steal" the pages, then they'd be
> recycled in the RX ring right away.

Code should just works, please read the following lines in the same
function....

        /* If frame is small enough to fit in skb->head, pull it completely.
         * If not, only pull ieee80211_hdr so that splice() or TCP coalesce
         * are more efficient.
         */
        hdrlen = (len <= skb_tailroom(skb)) ? len : sizeof(*hdr);

        memcpy(skb_put(skb, hdrlen), hdr, hdrlen);
        fraglen = len - hdrlen;

        if (fraglen) {
                int offset = (void *)hdr + hdrlen -
                             rxb_addr(rxb) + rxb_offset(rxb);

                skb_add_rx_frag(skb, 0, rxb_steal_page(rxb), offset,
                                fraglen, rxb->truesize);
        }



--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20 16:55                                                   ` Eric Dumazet
  (?)
@ 2013-02-20 16:59                                                   ` Johannes Berg
  2013-02-20 17:39                                                     ` Eric Dumazet
  -1 siblings, 1 reply; 72+ messages in thread
From: Johannes Berg @ 2013-02-20 16:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 08:55 -0800, Eric Dumazet wrote:

> > > It will make a huge difference, even on non pressure mode, as TCP
> > > receive window will grow twice faster.
> > 
> > Hmm, why does that depend on the allocation size?
> 
> I guess you missed all the patches about skb->truesize on netdev

Yeah, I don't follow netdev much any more...

> > > -	skb = alloc_skb(128, GFP_ATOMIC);
> > > +	fraglen = 128;
> > > +	/* if we use order-1 pages, copy to get better TCP performance */
> > > +	if (rxb->truesize > PAGE_SIZE)
> > > +		fraglen = max_t(unsigned, fraglen, len);
> > > +
> > > +	skb = alloc_skb(fraglen, GFP_ATOMIC);
> > 
> > Hmm, I don't quite understand -- that's not doing any copy?
> > 
> > FWIW if you do the copy you should not "steal" the pages, then they'd be
> > recycled in the RX ring right away.
> 
> Code should just works, please read the following lines in the same
> function....
> 
>         /* If frame is small enough to fit in skb->head, pull it completely.
>          * If not, only pull ieee80211_hdr so that splice() or TCP coalesce
>          * are more efficient.
>          */

Oh, right, though I guess the comment is now wrong since practically
every packet will be copied either here or in mac80211 (A-MSDUs are
split up there)

johannes


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20 16:55                                                   ` Eric Dumazet
  (?)
  (?)
@ 2013-02-20 17:01                                                   ` Johannes Berg
  2013-02-20 17:24                                                     ` Eric Dumazet
  -1 siblings, 1 reply; 72+ messages in thread
From: Johannes Berg @ 2013-02-20 17:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 08:55 -0800, Eric Dumazet wrote:

> > > Unless wifi speed reaches 10Gbps, following patch should do the trick
> > > 
> > > 
> > > diff --git a/drivers/net/wireless/iwlwifi/dvm/rx.c b/drivers/net/wireless/iwlwifi/dvm/rx.c
> > > index a4eed20..77a3ee3 100644
> > > --- a/drivers/net/wireless/iwlwifi/dvm/rx.c
> > > +++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
> > > @@ -750,7 +750,12 @@ static void iwlagn_pass_packet_to_mac80211(struct iwl_priv *priv,
> > >  	/* Dont use dev_alloc_skb(), we'll have enough headroom once
> > >  	 * ieee80211_hdr pulled.
> > >  	 */
> > > -	skb = alloc_skb(128, GFP_ATOMIC);
> > > +	fraglen = 128;
> > > +	/* if we use order-1 pages, copy to get better TCP performance */
> > > +	if (rxb->truesize > PAGE_SIZE)
> > > +		fraglen = max_t(unsigned, fraglen, len);
> > > +
> > > +	skb = alloc_skb(fraglen, GFP_ATOMIC);
> > 
> > Hmm, I don't quite understand -- that's not doing any copy?
> > 
> > FWIW if you do the copy you should not "steal" the pages, then they'd be
> > recycled in the RX ring right away.
> 
> Code should just works, please read the following lines in the same
> function....

FWIW, I think just using order-0 pages and turning 8k A-MSDUs off by
default makes more sense, A-MSDU is rarely used to begin with ...

Also, if we copy larger frames here, we should also take into account
the (variable) 802.11 header length to avoid copying into a position
where the IP header ends up being unaligned.

johannes


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20 17:01                                                   ` Johannes Berg
@ 2013-02-20 17:24                                                     ` Eric Dumazet
  2013-02-20 18:16                                                       ` Johannes Berg
  0 siblings, 1 reply; 72+ messages in thread
From: Eric Dumazet @ 2013-02-20 17:24 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 18:01 +0100, Johannes Berg wrote:

> FWIW, I think just using order-0 pages and turning 8k A-MSDUs off by
> default makes more sense, A-MSDU is rarely used to begin with ...
> 

My suggested patch makes sure that if someone needs 8k A-MSDU, iwlwifi
still works correctly.

> Also, if we copy larger frames here, we should also take into account
> the (variable) 802.11 header length to avoid copying into a position
> where the IP header ends up being unaligned.

But the current code already has this problem (if its a problem at all,
as on x86 NET_IP_ALIGN is 0)

I only increase available tailroom in the skb. Right now there is enough
room for IP and tcp headers.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20 16:59                                                   ` Johannes Berg
@ 2013-02-20 17:39                                                     ` Eric Dumazet
  0 siblings, 0 replies; 72+ messages in thread
From: Eric Dumazet @ 2013-02-20 17:39 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 17:59 +0100, Johannes Berg wrote:

> Yeah, I don't follow netdev much any more...
> 

Short answer : tcp stack has autotuning sk_rcvbuf, tracking _memory_ use
of a socket.

If network layer provides 8k fat skbs (holding 1500 bytes), the
advertised TCP receive window will be much smaller than if network layer
provides 2k skbs.

Its quite visible in a tcpdump in the beginning of a tcp session.

You'll see how linux grows the receive window, before and after my
patch. You'll see how the sender can send its data faster.

Providing nice skbs is a way to speedup Internet browsing, as most http
traffic happen exactly while tcp receiver didnt yet advertised a big
window. (Especially visible with large RTT)

> > 
> >         /* If frame is small enough to fit in skb->head, pull it completely.
> >          * If not, only pull ieee80211_hdr so that splice() or TCP coalesce
> >          * are more efficient.
> >          */
> 
> Oh, right, though I guess the comment is now wrong since practically
> every packet will be copied either here or in mac80211 (A-MSDUs are
> split up there)

Comment is not 'wrong', it describes what happens here.

If there is enough room in skb->head to avoid attaching a page fragment
to the skb, lets copy the frame so that the page fragment can be reused
by the NIC driver immediately.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20 17:24                                                     ` Eric Dumazet
@ 2013-02-20 18:16                                                       ` Johannes Berg
  2013-02-20 19:17                                                         ` Eric Dumazet
  0 siblings, 1 reply; 72+ messages in thread
From: Johannes Berg @ 2013-02-20 18:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 09:24 -0800, Eric Dumazet wrote:
> On Wed, 2013-02-20 at 18:01 +0100, Johannes Berg wrote:
> 
> > FWIW, I think just using order-0 pages and turning 8k A-MSDUs off by
> > default makes more sense, A-MSDU is rarely used to begin with ...
> > 
> 
> My suggested patch makes sure that if someone needs 8k A-MSDU, iwlwifi
> still works correctly.

True, but if we don't advertise 8k A-MSDU then they won't be used, and
it'll still work correctly, just not be able to receive such large
A-MSDUs. Linux can't even transmit _any_ A-MSDUs, and many (most?) APs
don't either.

> > Also, if we copy larger frames here, we should also take into account
> > the (variable) 802.11 header length to avoid copying into a position
> > where the IP header ends up being unaligned.
> 
> But the current code already has this problem (if its a problem at all,
> as on x86 NET_IP_ALIGN is 0)

Yes and no. The current code doesn't pull in much data, not even the IP
header, so worst case it copies 14 bytes (ethernet header) in mac80211
to obtain alignment. By pulling in everything here, it later has to copy
everything again later to obtain alignment.

johannes


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20 18:16                                                       ` Johannes Berg
@ 2013-02-20 19:17                                                         ` Eric Dumazet
  2013-02-20 19:58                                                             ` Johannes Berg
  0 siblings, 1 reply; 72+ messages in thread
From: Eric Dumazet @ 2013-02-20 19:17 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 19:16 +0100, Johannes Berg wrote:

> Yes and no. The current code doesn't pull in much data, not even the IP
> header, so worst case it copies 14 bytes (ethernet header) in mac80211
> to obtain alignment. By pulling in everything here, it later has to copy
> everything again later to obtain alignment.

Can you give me pointers to the code doing that ?

On x86 we should not doing anything.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20 19:17                                                         ` Eric Dumazet
@ 2013-02-20 19:58                                                             ` Johannes Berg
  0 siblings, 0 replies; 72+ messages in thread
From: Johannes Berg @ 2013-02-20 19:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 11:17 -0800, Eric Dumazet wrote:
> On Wed, 2013-02-20 at 19:16 +0100, Johannes Berg wrote:
> 
> > Yes and no. The current code doesn't pull in much data, not even the IP
> > header, so worst case it copies 14 bytes (ethernet header) in mac80211
> > to obtain alignment. By pulling in everything here, it later has to copy
> > everything again later to obtain alignment.
> 
> Can you give me pointers to the code doing that ?

Sure: net/mac80211/rx.c ~line 1900, look for #ifndef
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.

> On x86 we should not doing anything.

We don't do anything if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is
defined.

johannes


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
@ 2013-02-20 19:58                                                             ` Johannes Berg
  0 siblings, 0 replies; 72+ messages in thread
From: Johannes Berg @ 2013-02-20 19:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc MERLIN, David Miller, Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ,
	bhutchings-s/n/eUQHGBpZroRs9YW3xA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

On Wed, 2013-02-20 at 11:17 -0800, Eric Dumazet wrote:
> On Wed, 2013-02-20 at 19:16 +0100, Johannes Berg wrote:
> 
> > Yes and no. The current code doesn't pull in much data, not even the IP
> > header, so worst case it copies 14 bytes (ethernet header) in mac80211
> > to obtain alignment. By pulling in everything here, it later has to copy
> > everything again later to obtain alignment.
> 
> Can you give me pointers to the code doing that ?

Sure: net/mac80211/rx.c ~line 1900, look for #ifndef
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.

> On x86 we should not doing anything.

We don't do anything if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is
defined.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20 15:11                                             ` Eric Dumazet
  2013-02-20 16:20                                                 ` Johannes Berg
@ 2013-02-20 20:09                                               ` Johannes Berg
  1 sibling, 0 replies; 72+ messages in thread
From: Johannes Berg @ 2013-02-20 20:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 07:11 -0800, Eric Dumazet wrote:

> diff --git a/drivers/net/wireless/iwlwifi/dvm/rx.c b/drivers/net/wireless/iwlwifi/dvm/rx.c
> index a4eed20..77a3ee3 100644
> --- a/drivers/net/wireless/iwlwifi/dvm/rx.c
> +++ b/drivers/net/wireless/iwlwifi/dvm/rx.c
> @@ -750,7 +750,12 @@ static void iwlagn_pass_packet_to_mac80211(struct iwl_priv *priv,
>  	/* Dont use dev_alloc_skb(), we'll have enough headroom once
>  	 * ieee80211_hdr pulled.
>  	 */
> -	skb = alloc_skb(128, GFP_ATOMIC);
> +	fraglen = 128;
> +	/* if we use order-1 pages, copy to get better TCP performance */
> +	if (rxb->truesize > PAGE_SIZE)
> +		fraglen = max_t(unsigned, fraglen, len);

Btw, shouldn't this rather be something like

if (rxb->truesize > PAGE_SIZE && len < SKB_WITH_OVERHEAD(PAGE_SIZE))

to avoid allocating large SKBs in the case that this actually is a big
8k A-MSDU? That wouldn't be given to the TCP RX path anyway because
mac80211 will split the A-MSDU into MSDUs and memcpy along the way.

Anyway we're preparing a patch for 4k A-MSDU (which is really just
something like 3782 bytes or something).

johannes


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20 19:58                                                             ` Johannes Berg
  (?)
@ 2013-02-20 20:14                                                             ` Eric Dumazet
  2013-02-20 20:27                                                               ` Johannes Berg
  -1 siblings, 1 reply; 72+ messages in thread
From: Eric Dumazet @ 2013-02-20 20:14 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 20:58 +0100, Johannes Berg wrote:

> Sure: net/mac80211/rx.c ~line 1900, look for #ifndef
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.
> 
> > On x86 we should not doing anything.
> 
> We don't do anything if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is
> defined.

Thanks !

BTW, comment is outdated, as it mentions __skb_push(), while
its not used.




^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20 20:14                                                             ` Eric Dumazet
@ 2013-02-20 20:27                                                               ` Johannes Berg
  0 siblings, 0 replies; 72+ messages in thread
From: Johannes Berg @ 2013-02-20 20:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, 2013-02-20 at 12:14 -0800, Eric Dumazet wrote:
> On Wed, 2013-02-20 at 20:58 +0100, Johannes Berg wrote:
> 
> > Sure: net/mac80211/rx.c ~line 1900, look for #ifndef
> > CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.
> > 
> > > On x86 we should not doing anything.
> > 
> > We don't do anything if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is
> > defined.
> 
> Thanks !

:-)
Hey, I introduced that config symbol back then for precisely this
reason ;-)

> BTW, comment is outdated, as it mentions __skb_push(), while
> its not used.

Oops, good point, now I wonder what it did actually refer to. Time to
dig in git history, I'll fix it.

johannes


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-20  9:15                                           ` Johannes Berg
  2013-02-20 15:11                                             ` Eric Dumazet
@ 2013-02-23  6:14                                             ` Marc MERLIN
  2013-02-26 20:54                                               ` Johannes Berg
  1 sibling, 1 reply; 72+ messages in thread
From: Marc MERLIN @ 2013-02-23  6:14 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Eric Dumazet, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Wed, Feb 20, 2013 at 10:15:03AM +0100, Johannes Berg wrote:
> On Wed, 2013-02-20 at 10:12 +0100, Johannes Berg wrote:
> > On Tue, 2013-02-19 at 08:21 -0800, Eric Dumazet wrote:
> > > On Tue, 2013-02-19 at 11:03 +0100, Johannes Berg wrote:
> > > > On Mon, 2013-02-18 at 21:17 -0800, Eric Dumazet wrote:
> > > > 
> > > > > > chrome: page allocation failure: order:1, mode:0x4020
> > > > > > Pid: 8730, comm: chrome Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
> > > > > > Call Trace:
> > > > > >  <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c
> > > > 
> > > > > You could try to load iwlwifi with amsdu_size_8K set to 0 (disable)
> > > > > 
> > > > > It should hopefully use order-0 pages
> > > > 
> > > > It will, do that then, unfortunately it can't switch at runtime because
> > > > it advertised this support to the access point or clients.
> > > 
> > > What are the drawbacks of setting amsdu_size_8K to 0 by default ?
> > 
> > We're discussing this now, the only downside would be that we couldn't
> > receive 8k A-MSDUs. Thing is, practically nobody uses A-MSDU anyway, and
> > even when I suspect the difference between 4k and 8k won't be huge.
> 
> OTOH, this affects the protocol, and when you really can't allocate any
> order-1 pages you pointed out yourself that many other things also won't
> work, so I'm not really sure it makes a big difference if we change the
> driver?

That as an unscientific test, but when I did the NFS eats all my pages
test using ethernet, my system didn't hang like it did with iwlagn.

So while the NFS code is definitely doing something wrong when it uses
its default huge buffers, the e1000e code deals with it without hanging
my system.

So thanks for trying to improve the iwlagn code to avoid those system
lockups.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-02-23  6:14                                             ` Marc MERLIN
@ 2013-02-26 20:54                                               ` Johannes Berg
  2013-06-18 16:52                                                   ` Eric Dumazet
  0 siblings, 1 reply; 72+ messages in thread
From: Johannes Berg @ 2013-02-26 20:54 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: Eric Dumazet, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Fri, 2013-02-22 at 22:14 -0800, Marc MERLIN wrote:
> On Wed, Feb 20, 2013 at 10:15:03AM +0100, Johannes Berg wrote:
> > On Wed, 2013-02-20 at 10:12 +0100, Johannes Berg wrote:
> > > On Tue, 2013-02-19 at 08:21 -0800, Eric Dumazet wrote:
> > > > On Tue, 2013-02-19 at 11:03 +0100, Johannes Berg wrote:
> > > > > On Mon, 2013-02-18 at 21:17 -0800, Eric Dumazet wrote:
> > > > > 
> > > > > > > chrome: page allocation failure: order:1, mode:0x4020
> > > > > > > Pid: 8730, comm: chrome Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
> > > > > > > Call Trace:
> > > > > > >  <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c
> > > > > 
> > > > > > You could try to load iwlwifi with amsdu_size_8K set to 0 (disable)
> > > > > > 
> > > > > > It should hopefully use order-0 pages
> > > > > 
> > > > > It will, do that then, unfortunately it can't switch at runtime because
> > > > > it advertised this support to the access point or clients.
> > > > 
> > > > What are the drawbacks of setting amsdu_size_8K to 0 by default ?
> > > 
> > > We're discussing this now, the only downside would be that we couldn't
> > > receive 8k A-MSDUs. Thing is, practically nobody uses A-MSDU anyway, and
> > > even when I suspect the difference between 4k and 8k won't be huge.
> > 
> > OTOH, this affects the protocol, and when you really can't allocate any
> > order-1 pages you pointed out yourself that many other things also won't
> > work, so I'm not really sure it makes a big difference if we change the
> > driver?
> 
> That as an unscientific test, but when I did the NFS eats all my pages
> test using ethernet, my system didn't hang like it did with iwlagn.
> 
> So while the NFS code is definitely doing something wrong when it uses
> its default huge buffers, the e1000e code deals with it without hanging
> my system.
> 
> So thanks for trying to improve the iwlagn code to avoid those system
> lockups.

We'll be submitting a patch to make single pages default.

johannes


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
@ 2013-06-18 16:52                                                   ` Eric Dumazet
  0 siblings, 0 replies; 72+ messages in thread
From: Eric Dumazet @ 2013-06-18 16:52 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev

On Tue, 2013-02-26 at 21:54 +0100, Johannes Berg wrote:
> On Fri, 2013-02-22 at 22:14 -0800, Marc MERLIN wrote:
> > On Wed, Feb 20, 2013 at 10:15:03AM +0100, Johannes Berg wrote:
> > > On Wed, 2013-02-20 at 10:12 +0100, Johannes Berg wrote:
> > > > On Tue, 2013-02-19 at 08:21 -0800, Eric Dumazet wrote:
> > > > > On Tue, 2013-02-19 at 11:03 +0100, Johannes Berg wrote:
> > > > > > On Mon, 2013-02-18 at 21:17 -0800, Eric Dumazet wrote:
> > > > > > 
> > > > > > > > chrome: page allocation failure: order:1, mode:0x4020
> > > > > > > > Pid: 8730, comm: chrome Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
> > > > > > > > Call Trace:
> > > > > > > >  <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c
> > > > > > 
> > > > > > > You could try to load iwlwifi with amsdu_size_8K set to 0 (disable)
> > > > > > > 
> > > > > > > It should hopefully use order-0 pages
> > > > > > 
> > > > > > It will, do that then, unfortunately it can't switch at runtime because
> > > > > > it advertised this support to the access point or clients.
> > > > > 
> > > > > What are the drawbacks of setting amsdu_size_8K to 0 by default ?
> > > > 
> > > > We're discussing this now, the only downside would be that we couldn't
> > > > receive 8k A-MSDUs. Thing is, practically nobody uses A-MSDU anyway, and
> > > > even when I suspect the difference between 4k and 8k won't be huge.
> > > 
> > > OTOH, this affects the protocol, and when you really can't allocate any
> > > order-1 pages you pointed out yourself that many other things also won't
> > > work, so I'm not really sure it makes a big difference if we change the
> > > driver?
> > 
> > That as an unscientific test, but when I did the NFS eats all my pages
> > test using ethernet, my system didn't hang like it did with iwlagn.
> > 
> > So while the NFS code is definitely doing something wrong when it uses
> > its default huge buffers, the e1000e code deals with it without hanging
> > my system.
> > 
> > So thanks for trying to improve the iwlagn code to avoid those system
> > lockups.
> 
> We'll be submitting a patch to make single pages default.

Do you think the same change would be possible for
drivers/net/wireless/iwlegacy/4965-mac.c ?

Thanks !



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
@ 2013-06-18 16:52                                                   ` Eric Dumazet
  0 siblings, 0 replies; 72+ messages in thread
From: Eric Dumazet @ 2013-06-18 16:52 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Marc MERLIN, David Miller, Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ,
	bhutchings-s/n/eUQHGBpZroRs9YW3xA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA

On Tue, 2013-02-26 at 21:54 +0100, Johannes Berg wrote:
> On Fri, 2013-02-22 at 22:14 -0800, Marc MERLIN wrote:
> > On Wed, Feb 20, 2013 at 10:15:03AM +0100, Johannes Berg wrote:
> > > On Wed, 2013-02-20 at 10:12 +0100, Johannes Berg wrote:
> > > > On Tue, 2013-02-19 at 08:21 -0800, Eric Dumazet wrote:
> > > > > On Tue, 2013-02-19 at 11:03 +0100, Johannes Berg wrote:
> > > > > > On Mon, 2013-02-18 at 21:17 -0800, Eric Dumazet wrote:
> > > > > > 
> > > > > > > > chrome: page allocation failure: order:1, mode:0x4020
> > > > > > > > Pid: 8730, comm: chrome Tainted: G           O 3.7.8-amd64-preempt-20121226-fixwd #1
> > > > > > > > Call Trace:
> > > > > > > >  <IRQ>  [<ffffffff810d5f38>] warn_alloc_failed+0x117/0x12c
> > > > > > 
> > > > > > > You could try to load iwlwifi with amsdu_size_8K set to 0 (disable)
> > > > > > > 
> > > > > > > It should hopefully use order-0 pages
> > > > > > 
> > > > > > It will, do that then, unfortunately it can't switch at runtime because
> > > > > > it advertised this support to the access point or clients.
> > > > > 
> > > > > What are the drawbacks of setting amsdu_size_8K to 0 by default ?
> > > > 
> > > > We're discussing this now, the only downside would be that we couldn't
> > > > receive 8k A-MSDUs. Thing is, practically nobody uses A-MSDU anyway, and
> > > > even when I suspect the difference between 4k and 8k won't be huge.
> > > 
> > > OTOH, this affects the protocol, and when you really can't allocate any
> > > order-1 pages you pointed out yourself that many other things also won't
> > > work, so I'm not really sure it makes a big difference if we change the
> > > driver?
> > 
> > That as an unscientific test, but when I did the NFS eats all my pages
> > test using ethernet, my system didn't hang like it did with iwlagn.
> > 
> > So while the NFS code is definitely doing something wrong when it uses
> > its default huge buffers, the e1000e code deals with it without hanging
> > my system.
> > 
> > So thanks for trying to improve the iwlagn code to avoid those system
> > lockups.
> 
> We'll be submitting a patch to make single pages default.

Do you think the same change would be possible for
drivers/net/wireless/iwlegacy/4965-mac.c ?

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-06-18 16:52                                                   ` Eric Dumazet
  (?)
@ 2013-06-18 17:04                                                   ` Johannes Berg
  2013-06-19 13:09                                                     ` Stanislaw Gruszka
  -1 siblings, 1 reply; 72+ messages in thread
From: Johannes Berg @ 2013-06-18 17:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc MERLIN, David Miller, Larry.Finger, bhutchings,
	linux-wireless, netdev, sgruszka

On Tue, 2013-06-18 at 09:52 -0700, Eric Dumazet wrote:

> > We'll be submitting a patch to make single pages default.
> 
> Do you think the same change would be possible for
> drivers/net/wireless/iwlegacy/4965-mac.c ?

Yes, Stanislaw?

johannes


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out
  2013-06-18 17:04                                                   ` Johannes Berg
@ 2013-06-19 13:09                                                     ` Stanislaw Gruszka
  0 siblings, 0 replies; 72+ messages in thread
From: Stanislaw Gruszka @ 2013-06-19 13:09 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Eric Dumazet, Marc MERLIN, David Miller, Larry.Finger,
	bhutchings, linux-wireless, netdev

On Tue, Jun 18, 2013 at 07:04:47PM +0200, Johannes Berg wrote:
> On Tue, 2013-06-18 at 09:52 -0700, Eric Dumazet wrote:
> 
> > > We'll be submitting a patch to make single pages default.
> > 
> > Do you think the same change would be possible for
> > drivers/net/wireless/iwlegacy/4965-mac.c ?
> 
> Yes, Stanislaw?

Yes, I'll post 4965 patch.

Thanks
Stanislaw

^ permalink raw reply	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2013-06-19 13:08 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-29 16:38 3.2.8/amd64 full interrupt hangs and deadlocks under big network copies (page allocation failure) Marc MERLIN
2012-03-11 18:32 ` Marc MERLIN
2012-03-29  5:31   ` Marc MERLIN
2012-03-29  6:49     ` David Miller
2012-03-29 14:05       ` Marc MERLIN
2012-03-29 16:41     ` Marc MERLIN
2012-03-29 16:41       ` Marc MERLIN
2012-03-29 18:09 ` Ben Hutchings
2012-03-29 21:19   ` Marc MERLIN
2012-03-29 21:19     ` Marc MERLIN
2012-04-09 17:20   ` Marc MERLIN
2012-04-09 18:12     ` David Miller
2012-04-09 18:36       ` Marc MERLIN
2012-04-09 18:36         ` Marc MERLIN
2012-04-09 18:37         ` David Miller
2012-04-09 18:58           ` Larry Finger
2012-04-09 19:11             ` Eric Dumazet
2012-04-09 19:34               ` David Miller
2012-04-09 19:46                 ` Marc MERLIN
2012-04-09 19:46                   ` Marc MERLIN
2012-04-10  3:56                 ` Eric Dumazet
2012-04-10  5:11                   ` Marc MERLIN
2012-04-10  6:11                     ` Eric Dumazet
2012-04-11  5:27                       ` Marc MERLIN
2012-04-11  5:43                         ` Eric Dumazet
2012-04-11  5:43                           ` Eric Dumazet
2012-04-11  6:30                           ` [PATCH] tcp: avoid order-1 allocations on wifi and tx path Eric Dumazet
2012-04-11  6:30                             ` Eric Dumazet
2012-04-11  7:38                             ` Eric Dumazet
2012-04-11  7:38                               ` Eric Dumazet
2012-04-11 14:12                               ` David Miller
2012-04-11 14:12                                 ` David Miller
2012-04-11 14:11                             ` David Miller
2012-04-11  6:08                         ` [PATCH] net: allow pskb_expand_head() to get maximum tailroom Eric Dumazet
2012-04-11 14:11                           ` David Miller
2012-04-11 14:11                             ` David Miller
2012-07-15 21:59                         ` 3.4.4/amd64 full interrupt hangs under big nfs copies Marc MERLIN
2012-07-16  6:18                           ` Eric Dumazet
2012-07-16 15:18                             ` Marc MERLIN
2012-07-16 16:21                               ` Eric Dumazet
2012-07-16 17:17                                 ` Marc MERLIN
2013-02-19  4:05                                 ` 3.7.8/amd64 full interrupt hangs due to iwlwifi under big nfs copies out Marc MERLIN
2013-02-19  5:17                                   ` Eric Dumazet
2013-02-19  5:26                                     ` Marc MERLIN
2013-02-19 10:03                                     ` Johannes Berg
2013-02-19 16:18                                       ` Marc MERLIN
2013-02-19 16:36                                         ` Eric Dumazet
2013-02-19 16:21                                       ` Eric Dumazet
2013-02-20  9:12                                         ` Johannes Berg
2013-02-20  9:15                                           ` Johannes Berg
2013-02-20 15:11                                             ` Eric Dumazet
2013-02-20 16:20                                               ` Johannes Berg
2013-02-20 16:20                                                 ` Johannes Berg
2013-02-20 16:55                                                 ` Eric Dumazet
2013-02-20 16:55                                                   ` Eric Dumazet
2013-02-20 16:59                                                   ` Johannes Berg
2013-02-20 17:39                                                     ` Eric Dumazet
2013-02-20 17:01                                                   ` Johannes Berg
2013-02-20 17:24                                                     ` Eric Dumazet
2013-02-20 18:16                                                       ` Johannes Berg
2013-02-20 19:17                                                         ` Eric Dumazet
2013-02-20 19:58                                                           ` Johannes Berg
2013-02-20 19:58                                                             ` Johannes Berg
2013-02-20 20:14                                                             ` Eric Dumazet
2013-02-20 20:27                                                               ` Johannes Berg
2013-02-20 20:09                                               ` Johannes Berg
2013-02-23  6:14                                             ` Marc MERLIN
2013-02-26 20:54                                               ` Johannes Berg
2013-06-18 16:52                                                 ` Eric Dumazet
2013-06-18 16:52                                                   ` Eric Dumazet
2013-06-18 17:04                                                   ` Johannes Berg
2013-06-19 13:09                                                     ` Stanislaw Gruszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.