linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
@ 2016-12-13 17:11 David Arendt
  2016-12-13 19:55 ` Xin Zhou
  2016-12-14 10:17 ` Michal Hocko
  0 siblings, 2 replies; 9+ messages in thread
From: David Arendt @ 2016-12-13 17:11 UTC (permalink / raw)
  To: linux-btrfs, linux-kernel

Hi,

I receive the following page allocation stall while copying lots of
large files from one btrfs hdd to another.

Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for
12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8
Tainted: P           O    4.9.0 #1
Dec 13 13:04:29 server kernel: Hardware name: ASUS All Series/H87M-PRO,
BIOS 2102 10/28/2014
Dec 13 13:04:29 server kernel: Workqueue: btrfs-extent-refs
btrfs_extent_refs_helper
Dec 13 13:04:29 server kernel:  0000000000000000 ffffffff813f3a59
ffffffff81976b28 ffffc90011093750
Dec 13 13:04:29 server kernel:  ffffffff81114fc1 02400840f39b6bc0
ffffffff81976b28 ffffc900110936f8
Dec 13 13:04:29 server kernel:  ffff880700000010 ffffc90011093760
ffffc90011093710 0000000002400840
Dec 13 13:04:29 server kernel: Call Trace:
Dec 13 13:04:29 server kernel:  [<ffffffff813f3a59>] ? dump_stack+0x46/0x5d
Dec 13 13:04:29 server kernel:  [<ffffffff81114fc1>] ?
warn_alloc+0x111/0x130
Dec 13 13:04:33 server kernel:  [<ffffffff81115c38>] ?
__alloc_pages_nodemask+0xbe8/0xd30
Dec 13 13:04:33 server kernel:  [<ffffffff8110de74>] ?
pagecache_get_page+0xe4/0x230
Dec 13 13:04:33 server kernel:  [<ffffffff81323a8b>] ?
alloc_extent_buffer+0x10b/0x400
Dec 13 13:04:33 server kernel:  [<ffffffff812ef8c5>] ?
btrfs_alloc_tree_block+0x125/0x560
Dec 13 13:04:33 server kernel:  [<ffffffff8132442f>] ?
read_extent_buffer_pages+0x21f/0x280
Dec 13 13:04:33 server kernel:  [<ffffffff812d81f1>] ?
__btrfs_cow_block+0x141/0x580
Dec 13 13:04:33 server kernel:  [<ffffffff812d87b0>] ?
btrfs_cow_block+0x100/0x150
Dec 13 13:04:33 server kernel:  [<ffffffff812dc1d9>] ?
btrfs_search_slot+0x1e9/0x9c0
Dec 13 13:04:33 server kernel:  [<ffffffff8131ead2>] ?
__set_extent_bit+0x512/0x550
Dec 13 13:04:33 server kernel:  [<ffffffff812e1ab5>] ?
lookup_inline_extent_backref+0xf5/0x5e0
Dec 13 13:04:34 server kernel:  [<ffffffff8131f0a4>] ?
set_extent_bit+0x24/0x30
Dec 13 13:04:34 server kernel:  [<ffffffff812e4334>] ?
update_block_group.isra.34+0x114/0x380
Dec 13 13:04:34 server kernel:  [<ffffffff812e4694>] ?
__btrfs_free_extent.isra.35+0xf4/0xd20
Dec 13 13:04:34 server kernel:  [<ffffffff8134d561>] ?
btrfs_merge_delayed_refs+0x61/0x5d0
Dec 13 13:04:34 server kernel:  [<ffffffff812e8bd2>] ?
__btrfs_run_delayed_refs+0x902/0x10a0
Dec 13 13:04:34 server kernel:  [<ffffffff812ec0f0>] ?
btrfs_run_delayed_refs+0x90/0x2a0
Dec 13 13:04:34 server kernel:  [<ffffffff812ec384>] ?
delayed_ref_async_start+0x84/0xa0
Dec 13 13:04:34 server kernel:  [<ffffffff8109c16d>] ?
process_one_work+0x11d/0x3b0
Dec 13 13:04:34 server kernel:  [<ffffffff8109c442>] ?
worker_thread+0x42/0x4b0
Dec 13 13:04:34 server kernel:  [<ffffffff8109c400>] ?
process_one_work+0x3b0/0x3b0
Dec 13 13:04:34 server kernel:  [<ffffffff8109c400>] ?
process_one_work+0x3b0/0x3b0
Dec 13 13:04:34 server kernel:  [<ffffffff8108ba5e>] ?
do_group_exit+0x2e/0xa0
Dec 13 13:04:34 server kernel:  [<ffffffff810a12d9>] ? kthread+0xb9/0xd0
Dec 13 13:04:34 server kernel:  [<ffffffff810a1220>] ?
kthread_park+0x50/0x50
Dec 13 13:04:34 server kernel:  [<ffffffff81798cd2>] ?
ret_from_fork+0x22/0x30
Dec 13 13:04:34 server kernel: Mem-Info:
Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34
isolated_anon:0\x0a active_file:7370032 inactive_file:450105
isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189
unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a
mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241
free_cma:0
Dec 13 13:04:34 server kernel: Node 0 active_anon:80kB
inactive_anon:136kB active_file:29480128kB inactive_file:1800420kB
unevictable:0kB isolated(anon):0kB isolated(file):1280kB mapped:16944kB
dirty:2090992kB writeback:756kB shmem:0kB writeback_tmp:0kB unstable:0kB
pages_scanned:258821 all_unreclaimable? no
Dec 13 13:04:34 server kernel: DMA free:15868kB min:8kB low:20kB
high:32kB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB
managed:15892kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:24kB
kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB
Dec 13 13:04:34 server kernel: lowmem_reserve[]: 0 3428 32019 32019
Dec 13 13:04:34 server kernel: DMA32 free:116800kB min:2448kB low:5956kB
high:9464kB active_anon:0kB inactive_anon:0kB active_file:3087928kB
inactive_file:191336kB unevictable:0kB writepending:221828kB
present:3590832kB managed:3513936kB mlocked:0kB slab_reclaimable:93252kB
slab_unreclaimable:20520kB kernel_stack:48kB pagetables:212kB bounce:0kB
free_pcp:4kB local_pcp:0kB free_cma:0kB
Dec 13 13:04:34 server kernel: lowmem_reserve[]: 0 0 0 0
Dec 13 13:04:34 server kernel: DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U)
1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U)
1*2048kB (M) 3*4096kB (M) = 15868kB
Dec 13 13:04:34 server kernel: DMA32: 940*4kB (UME) 4006*8kB (UME)
3308*16kB (UME) 791*32kB (UME) 41*64kB (UE) 1*128kB (U) 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 116800kB
Dec 13 13:04:34 server kernel: Normal: 75*4kB (E) 192*8kB (UE) 94*16kB
(UME) 57*32kB (U) 33*64kB (UM) 16*128kB (UM) 10*256kB (UM) 4*512kB (U)
0*1024kB 1*2048kB (U) 1*4096kB (U) = 20076kB
Dec 13 13:04:34 server kernel: Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
Dec 13 13:04:34 server kernel: 7820441 total pagecache pages
Dec 13 13:04:34 server kernel: 69 pages in swap cache
Dec 13 13:04:34 server kernel: Swap cache stats: add 60282, delete
60213, find 249865/258319
Dec 13 13:04:34 server kernel: Free swap  = 62482976kB
Dec 13 13:04:34 server kernel: Total swap = 62522692kB
Dec 13 13:04:34 server kernel: 8364614 pages RAM
Dec 13 13:04:34 server kernel: 0 pages HighMem/MovableOnly
Dec 13 13:04:34 server kernel: 162971 pages reserved

Has anyone any idea what could go wrong here ?

Thanks in advance,

David Arendt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
  2016-12-13 17:11 page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another David Arendt
@ 2016-12-13 19:55 ` Xin Zhou
  2016-12-13 20:26   ` David Arendt
  2016-12-14 10:17 ` Michal Hocko
  1 sibling, 1 reply; 9+ messages in thread
From: Xin Zhou @ 2016-12-13 19:55 UTC (permalink / raw)
  To: David Arendt; +Cc: linux-btrfs, linux-kernel

Hi David,

It has GFP_NOFS flags, according to definition,
the issue might have happened during initial DISK/IO.

By the way, did you get a chance to dump the meminfo and run "top" before the system hang?
It seems more info about the system running state needed to know the issue. Thanks.

Xin

 

Sent: Tuesday, December 13, 2016 at 9:11 AM
From: "David Arendt" <admin@prnet.org>
To: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
Hi,

I receive the following page allocation stall while copying lots of
large files from one btrfs hdd to another.

Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for
12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8
Tainted: P O 4.9.0 #1
Dec 13 13:04:29 server kernel: Hardware name: ASUS All Series/H87M-PRO,
BIOS 2102 10/28/2014
Dec 13 13:04:29 server kernel: Workqueue: btrfs-extent-refs
btrfs_extent_refs_helper
Dec 13 13:04:29 server kernel: 0000000000000000 ffffffff813f3a59
ffffffff81976b28 ffffc90011093750
Dec 13 13:04:29 server kernel: ffffffff81114fc1 02400840f39b6bc0
ffffffff81976b28 ffffc900110936f8
Dec 13 13:04:29 server kernel: ffff880700000010 ffffc90011093760
ffffc90011093710 0000000002400840
Dec 13 13:04:29 server kernel: Call Trace:
Dec 13 13:04:29 server kernel: [<ffffffff813f3a59>] ? dump_stack+0x46/0x5d
Dec 13 13:04:29 server kernel: [<ffffffff81114fc1>] ?
warn_alloc+0x111/0x130
Dec 13 13:04:33 server kernel: [<ffffffff81115c38>] ?
__alloc_pages_nodemask+0xbe8/0xd30
Dec 13 13:04:33 server kernel: [<ffffffff8110de74>] ?
pagecache_get_page+0xe4/0x230
Dec 13 13:04:33 server kernel: [<ffffffff81323a8b>] ?
alloc_extent_buffer+0x10b/0x400
Dec 13 13:04:33 server kernel: [<ffffffff812ef8c5>] ?
btrfs_alloc_tree_block+0x125/0x560
Dec 13 13:04:33 server kernel: [<ffffffff8132442f>] ?
read_extent_buffer_pages+0x21f/0x280
Dec 13 13:04:33 server kernel: [<ffffffff812d81f1>] ?
__btrfs_cow_block+0x141/0x580
Dec 13 13:04:33 server kernel: [<ffffffff812d87b0>] ?
btrfs_cow_block+0x100/0x150
Dec 13 13:04:33 server kernel: [<ffffffff812dc1d9>] ?
btrfs_search_slot+0x1e9/0x9c0
Dec 13 13:04:33 server kernel: [<ffffffff8131ead2>] ?
__set_extent_bit+0x512/0x550
Dec 13 13:04:33 server kernel: [<ffffffff812e1ab5>] ?
lookup_inline_extent_backref+0xf5/0x5e0
Dec 13 13:04:34 server kernel: [<ffffffff8131f0a4>] ?
set_extent_bit+0x24/0x30
Dec 13 13:04:34 server kernel: [<ffffffff812e4334>] ?
update_block_group.isra.34+0x114/0x380
Dec 13 13:04:34 server kernel: [<ffffffff812e4694>] ?
__btrfs_free_extent.isra.35+0xf4/0xd20
Dec 13 13:04:34 server kernel: [<ffffffff8134d561>] ?
btrfs_merge_delayed_refs+0x61/0x5d0
Dec 13 13:04:34 server kernel: [<ffffffff812e8bd2>] ?
__btrfs_run_delayed_refs+0x902/0x10a0
Dec 13 13:04:34 server kernel: [<ffffffff812ec0f0>] ?
btrfs_run_delayed_refs+0x90/0x2a0
Dec 13 13:04:34 server kernel: [<ffffffff812ec384>] ?
delayed_ref_async_start+0x84/0xa0
Dec 13 13:04:34 server kernel: [<ffffffff8109c16d>] ?
process_one_work+0x11d/0x3b0
Dec 13 13:04:34 server kernel: [<ffffffff8109c442>] ?
worker_thread+0x42/0x4b0
Dec 13 13:04:34 server kernel: [<ffffffff8109c400>] ?
process_one_work+0x3b0/0x3b0
Dec 13 13:04:34 server kernel: [<ffffffff8109c400>] ?
process_one_work+0x3b0/0x3b0
Dec 13 13:04:34 server kernel: [<ffffffff8108ba5e>] ?
do_group_exit+0x2e/0xa0
Dec 13 13:04:34 server kernel: [<ffffffff810a12d9>] ? kthread+0xb9/0xd0
Dec 13 13:04:34 server kernel: [<ffffffff810a1220>] ?
kthread_park+0x50/0x50
Dec 13 13:04:34 server kernel: [<ffffffff81798cd2>] ?
ret_from_fork+0x22/0x30
Dec 13 13:04:34 server kernel: Mem-Info:
Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34
isolated_anon:0\x0a active_file:7370032 inactive_file:450105
isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189
unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a
mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241
free_cma:0
Dec 13 13:04:34 server kernel: Node 0 active_anon:80kB
inactive_anon:136kB active_file:29480128kB inactive_file:1800420kB
unevictable:0kB isolated(anon):0kB isolated(file):1280kB mapped:16944kB
dirty:2090992kB writeback:756kB shmem:0kB writeback_tmp:0kB unstable:0kB
pages_scanned:258821 all_unreclaimable? no
Dec 13 13:04:34 server kernel: DMA free:15868kB min:8kB low:20kB
high:32kB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB
managed:15892kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:24kB
kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB
Dec 13 13:04:34 server kernel: lowmem_reserve[]: 0 3428 32019 32019
Dec 13 13:04:34 server kernel: DMA32 free:116800kB min:2448kB low:5956kB
high:9464kB active_anon:0kB inactive_anon:0kB active_file:3087928kB
inactive_file:191336kB unevictable:0kB writepending:221828kB
present:3590832kB managed:3513936kB mlocked:0kB slab_reclaimable:93252kB
slab_unreclaimable:20520kB kernel_stack:48kB pagetables:212kB bounce:0kB
free_pcp:4kB local_pcp:0kB free_cma:0kB
Dec 13 13:04:34 server kernel: lowmem_reserve[]: 0 0 0 0
Dec 13 13:04:34 server kernel: DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U)
1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U)
1*2048kB (M) 3*4096kB (M) = 15868kB
Dec 13 13:04:34 server kernel: DMA32: 940*4kB (UME) 4006*8kB (UME)
3308*16kB (UME) 791*32kB (UME) 41*64kB (UE) 1*128kB (U) 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 116800kB
Dec 13 13:04:34 server kernel: Normal: 75*4kB (E) 192*8kB (UE) 94*16kB
(UME) 57*32kB (U) 33*64kB (UM) 16*128kB (UM) 10*256kB (UM) 4*512kB (U)
0*1024kB 1*2048kB (U) 1*4096kB (U) = 20076kB
Dec 13 13:04:34 server kernel: Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
Dec 13 13:04:34 server kernel: 7820441 total pagecache pages
Dec 13 13:04:34 server kernel: 69 pages in swap cache
Dec 13 13:04:34 server kernel: Swap cache stats: add 60282, delete
60213, find 249865/258319
Dec 13 13:04:34 server kernel: Free swap = 62482976kB
Dec 13 13:04:34 server kernel: Total swap = 62522692kB
Dec 13 13:04:34 server kernel: 8364614 pages RAM
Dec 13 13:04:34 server kernel: 0 pages HighMem/MovableOnly
Dec 13 13:04:34 server kernel: 162971 pages reserved

Has anyone any idea what could go wrong here ?

Thanks in advance,

David Arendt


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
  2016-12-13 19:55 ` Xin Zhou
@ 2016-12-13 20:26   ` David Arendt
  2016-12-13 23:28     ` Duncan
  0 siblings, 1 reply; 9+ messages in thread
From: David Arendt @ 2016-12-13 20:26 UTC (permalink / raw)
  To: Xin Zhou; +Cc: linux-btrfs, linux-kernel

Hi,

unfortunately I did not dump meminfo before the crash.

Here is the actual meminfo as of now with the copy running for about 3
hours.

MemTotal:       32806572 kB
MemFree:          197336 kB
MemAvailable:   31226888 kB
Buffers:              52 kB
Cached:         30603160 kB
SwapCached:        11880 kB
Active:         29015008 kB
Inactive:        2017292 kB
Active(anon):     162124 kB
Inactive(anon):   285104 kB
Active(file):   28852884 kB
Inactive(file):  1732188 kB
Unevictable:        7092 kB
Mlocked:            7092 kB
SwapTotal:      62522692 kB
SwapFree:       62460464 kB
Dirty:            231944 kB
Writeback:             0 kB
AnonPages:        425160 kB
Mapped:           227656 kB
Shmem:             12160 kB
Slab:            1380280 kB
SReclaimable:     774584 kB
SUnreclaim:       605696 kB
KernelStack:        7840 kB
PageTables:        12800 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    78925976 kB
Committed_AS:    1883256 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:    20220592 kB
DirectMap2M:    13238272 kB
DirectMap1G:     1048576 kB

I will write a cronjob that dumps meminfo every 5 minutes to a file, so
I will have more info on the next crash.

The crash is not an isolated one as I already had this crash multiple
times with -rc7 and -rc8. It seems only to occur when copying from
7200rpm harddisks to 5600rpm ones, and never when copying between two
7200rpm or two 5400rpm.

Thanks,
David Arendt

On 12/13/2016 08:55 PM, Xin Zhou wrote:
> Hi David,
>
> It has GFP_NOFS flags, according to definition,
> the issue might have happened during initial DISK/IO.
>
> By the way, did you get a chance to dump the meminfo and run "top" before the system hang?
> It seems more info about the system running state needed to know the issue. Thanks.
>
> Xin
>
>  
>
> Sent: Tuesday, December 13, 2016 at 9:11 AM
> From: "David Arendt" <admin@prnet.org>
> To: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org
> Subject: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
> Hi,
>
> I receive the following page allocation stall while copying lots of
> large files from one btrfs hdd to another.
>
> Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for
> 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
> Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8
> Tainted: P O 4.9.0 #1
> Dec 13 13:04:29 server kernel: Hardware name: ASUS All Series/H87M-PRO,
> BIOS 2102 10/28/2014
> Dec 13 13:04:29 server kernel: Workqueue: btrfs-extent-refs
> btrfs_extent_refs_helper
> Dec 13 13:04:29 server kernel: 0000000000000000 ffffffff813f3a59
> ffffffff81976b28 ffffc90011093750
> Dec 13 13:04:29 server kernel: ffffffff81114fc1 02400840f39b6bc0
> ffffffff81976b28 ffffc900110936f8
> Dec 13 13:04:29 server kernel: ffff880700000010 ffffc90011093760
> ffffc90011093710 0000000002400840
> Dec 13 13:04:29 server kernel: Call Trace:
> Dec 13 13:04:29 server kernel: [<ffffffff813f3a59>] ? dump_stack+0x46/0x5d
> Dec 13 13:04:29 server kernel: [<ffffffff81114fc1>] ?
> warn_alloc+0x111/0x130
> Dec 13 13:04:33 server kernel: [<ffffffff81115c38>] ?
> __alloc_pages_nodemask+0xbe8/0xd30
> Dec 13 13:04:33 server kernel: [<ffffffff8110de74>] ?
> pagecache_get_page+0xe4/0x230
> Dec 13 13:04:33 server kernel: [<ffffffff81323a8b>] ?
> alloc_extent_buffer+0x10b/0x400
> Dec 13 13:04:33 server kernel: [<ffffffff812ef8c5>] ?
> btrfs_alloc_tree_block+0x125/0x560
> Dec 13 13:04:33 server kernel: [<ffffffff8132442f>] ?
> read_extent_buffer_pages+0x21f/0x280
> Dec 13 13:04:33 server kernel: [<ffffffff812d81f1>] ?
> __btrfs_cow_block+0x141/0x580
> Dec 13 13:04:33 server kernel: [<ffffffff812d87b0>] ?
> btrfs_cow_block+0x100/0x150
> Dec 13 13:04:33 server kernel: [<ffffffff812dc1d9>] ?
> btrfs_search_slot+0x1e9/0x9c0
> Dec 13 13:04:33 server kernel: [<ffffffff8131ead2>] ?
> __set_extent_bit+0x512/0x550
> Dec 13 13:04:33 server kernel: [<ffffffff812e1ab5>] ?
> lookup_inline_extent_backref+0xf5/0x5e0
> Dec 13 13:04:34 server kernel: [<ffffffff8131f0a4>] ?
> set_extent_bit+0x24/0x30
> Dec 13 13:04:34 server kernel: [<ffffffff812e4334>] ?
> update_block_group.isra.34+0x114/0x380
> Dec 13 13:04:34 server kernel: [<ffffffff812e4694>] ?
> __btrfs_free_extent.isra.35+0xf4/0xd20
> Dec 13 13:04:34 server kernel: [<ffffffff8134d561>] ?
> btrfs_merge_delayed_refs+0x61/0x5d0
> Dec 13 13:04:34 server kernel: [<ffffffff812e8bd2>] ?
> __btrfs_run_delayed_refs+0x902/0x10a0
> Dec 13 13:04:34 server kernel: [<ffffffff812ec0f0>] ?
> btrfs_run_delayed_refs+0x90/0x2a0
> Dec 13 13:04:34 server kernel: [<ffffffff812ec384>] ?
> delayed_ref_async_start+0x84/0xa0
> Dec 13 13:04:34 server kernel: [<ffffffff8109c16d>] ?
> process_one_work+0x11d/0x3b0
> Dec 13 13:04:34 server kernel: [<ffffffff8109c442>] ?
> worker_thread+0x42/0x4b0
> Dec 13 13:04:34 server kernel: [<ffffffff8109c400>] ?
> process_one_work+0x3b0/0x3b0
> Dec 13 13:04:34 server kernel: [<ffffffff8109c400>] ?
> process_one_work+0x3b0/0x3b0
> Dec 13 13:04:34 server kernel: [<ffffffff8108ba5e>] ?
> do_group_exit+0x2e/0xa0
> Dec 13 13:04:34 server kernel: [<ffffffff810a12d9>] ? kthread+0xb9/0xd0
> Dec 13 13:04:34 server kernel: [<ffffffff810a1220>] ?
> kthread_park+0x50/0x50
> Dec 13 13:04:34 server kernel: [<ffffffff81798cd2>] ?
> ret_from_fork+0x22/0x30
> Dec 13 13:04:34 server kernel: Mem-Info:
> Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34
> isolated_anon:0\x0a active_file:7370032 inactive_file:450105
> isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189
> unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a
> mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241
> free_cma:0
> Dec 13 13:04:34 server kernel: Node 0 active_anon:80kB
> inactive_anon:136kB active_file:29480128kB inactive_file:1800420kB
> unevictable:0kB isolated(anon):0kB isolated(file):1280kB mapped:16944kB
> dirty:2090992kB writeback:756kB shmem:0kB writeback_tmp:0kB unstable:0kB
> pages_scanned:258821 all_unreclaimable? no
> Dec 13 13:04:34 server kernel: DMA free:15868kB min:8kB low:20kB
> high:32kB active_anon:0kB inactive_anon:0kB active_file:0kB
> inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB
> managed:15892kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:24kB
> kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
> free_cma:0kB
> Dec 13 13:04:34 server kernel: lowmem_reserve[]: 0 3428 32019 32019
> Dec 13 13:04:34 server kernel: DMA32 free:116800kB min:2448kB low:5956kB
> high:9464kB active_anon:0kB inactive_anon:0kB active_file:3087928kB
> inactive_file:191336kB unevictable:0kB writepending:221828kB
> present:3590832kB managed:3513936kB mlocked:0kB slab_reclaimable:93252kB
> slab_unreclaimable:20520kB kernel_stack:48kB pagetables:212kB bounce:0kB
> free_pcp:4kB local_pcp:0kB free_cma:0kB
> Dec 13 13:04:34 server kernel: lowmem_reserve[]: 0 0 0 0
> Dec 13 13:04:34 server kernel: DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U)
> 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U)
> 1*2048kB (M) 3*4096kB (M) = 15868kB
> Dec 13 13:04:34 server kernel: DMA32: 940*4kB (UME) 4006*8kB (UME)
> 3308*16kB (UME) 791*32kB (UME) 41*64kB (UE) 1*128kB (U) 0*256kB 0*512kB
> 0*1024kB 0*2048kB 0*4096kB = 116800kB
> Dec 13 13:04:34 server kernel: Normal: 75*4kB (E) 192*8kB (UE) 94*16kB
> (UME) 57*32kB (U) 33*64kB (UM) 16*128kB (UM) 10*256kB (UM) 4*512kB (U)
> 0*1024kB 1*2048kB (U) 1*4096kB (U) = 20076kB
> Dec 13 13:04:34 server kernel: Node 0 hugepages_total=0 hugepages_free=0
> hugepages_surp=0 hugepages_size=2048kB
> Dec 13 13:04:34 server kernel: 7820441 total pagecache pages
> Dec 13 13:04:34 server kernel: 69 pages in swap cache
> Dec 13 13:04:34 server kernel: Swap cache stats: add 60282, delete
> 60213, find 249865/258319
> Dec 13 13:04:34 server kernel: Free swap = 62482976kB
> Dec 13 13:04:34 server kernel: Total swap = 62522692kB
> Dec 13 13:04:34 server kernel: 8364614 pages RAM
> Dec 13 13:04:34 server kernel: 0 pages HighMem/MovableOnly
> Dec 13 13:04:34 server kernel: 162971 pages reserved
>
> Has anyone any idea what could go wrong here ?
>
> Thanks in advance,
>
> David Arendt
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
  2016-12-13 20:26   ` David Arendt
@ 2016-12-13 23:28     ` Duncan
  0 siblings, 0 replies; 9+ messages in thread
From: Duncan @ 2016-12-13 23:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-btrfs

David Arendt posted on Tue, 13 Dec 2016 21:26:04 +0100 as excerpted:

> The crash is not an isolated one as I already had this crash multiple
> times with -rc7 and -rc8. It seems only to occur when copying from
> 7200rpm harddisks to 5600rpm ones, and never when copying between two
> 7200rpm or two 5400rpm.

That reads very much like a bug previously reported here and on LKML 
itself (with Linus and other high-level kernel devs responding) that 
resulted in a(nother) discussion of whether the writecache knobs in /proc/
sys/dirty_* should be updated.

It's generally accepted wisdom among kernel devs and sysadmins[1] that 
the existing dirty* write-cache defaults, set at a time when common 
system memories measured in the MiB, not the GiB of today, are no longer 
appropriate and should be lowered, but the lack of agreement as to 
precisely what the settings should be, combined with inertia and the lack 
of practical pressure given that those who know about the problem have 
long since adjusted their own systems accordingly, means the existing now 
generally agreed to be inappropriate defaults continue to remain. =:^(

These knobs can be tweaked in several ways.  For temporary 
experimentation, it's generally easiest to write (as root) updated values 
directly to the /proc/sys/vm/dirty_* files themselves.  Once you find 
values you are comfortable with, most distros have an existing sysctl 
config[2] that can be altered as appropriate, so the settings get 
reapplied at each boot.

Various articles with the details are easily googled so I'll be brief 
here, but here's the apropos settings and comments from my own
/etc/sysctl.conf and a brief explanation:

# write-cache, foreground/background flushing
# vm.dirty_ratio = 10 (% of RAM)
# make it 3% of 16G ~ half a gig
vm.dirty_ratio = 3
# vm.dirty_bytes = 0

# vm.dirty_background_ratio = 5 (% of RAM)
# make it 1% of 16G ~ 160 M
vm.dirty_background_ratio = 1
# vm.dirty_background_bytes = 0

# vm.dirty_expire_centisecs = 2999 (30 sec)
# vm.dirty_writeback_centisecs = 499 (5 sec)
# make it 10 sec
vm.dirty_writeback_centisecs = 1000


The *_bytes and *_ratio files configure the same thing in different ways, 
ratio being percentage of RAM, bytes being... bytes.  Set one or the 
other as you prefer and the other one will be automatically zeroed out.  
The vm.dirty_background_* settings control when the kernel starts lower 
priority flushing, while high priority vm.dirty_* (not background) 
settings control when the kernel forces threads trying to do further 
writes to wait until some currently in-flight writes are completed.

But those values only apply to size up until the expiry time has 
occurred, at which point writeback is still forced.  That's where that 
setting comes in.

The problem is that memory has gotten bigger much faster than the speed 
of actually writing out to slow spinning rust has increased. (Fast ssds 
have far less issues in this regard, tho slow flash like common USB thumb 
drives remain affected, indeed, sometimes even more so.)  Common random-
write spinning rust write speeds are 100 MiB/sec and may be as low as 30 
MiB/sec.  Meanwhile, the default 10% dirty_ratio, at 16 GiB memory size, 
approaches[3] 1.6 GiB, ~1600 MiB.  At 100 MiB/sec that's 16 seconds worth 
of writeback to clear.  At 30 MiB/sec, that's... well beyond the 30 
second expiry time!

To be clear, there's still a bug if the system crashes as a result -- the 
normal case should simply be a system that at worst doesn't respond for 
the writeback period, to be sure a problem in itself when that period 
exceeds double-digit seconds, but surely less of one than a total crash, 
as long as the system /does/ come back after perhaps half a minute or so.

Anyway, as you can see from the above excerpt from my own sysctl.conf, 
for my 16 GiB system, I use a much more reasonable 1% background writeback 
trigger, ~160 MiB on 16 GiB, and 3% high-priority/foreground, ~ half a 
GiB on 16 GiB.  I actually set those long ago, before I switched to btrfs 
and before I switched to ssd as well, but even tho ssd should work far 
better with the defaults than spinning rust does, those settings don't 
hurt on ssd either, and I've seen no reason to change them.

So try 1% background and 3% foreground flushing ratios on your 32 GiB 
system as well, and see if that helps, or possibly try setting the _bytes 
values instead, since 1% is still quite huge in writeback time terms, on 
32 GiB.  Tweaking those down on the previously reported bug certainly 
helped there as he couldn't reproduce after that, and it looks like 
you're running 2+ GiB dirty based on your posted meminfo now, so it 
should reduce that, and hopefully eliminate the trigger for you, tho of 
course it won't fix the root bug.  As I said it shouldn't crash in any 
case, even if it goes unresponsive for half a minute or so at a time, so 
there's certainly a bug to fix, but that will hopefully let you work 
without running into it.

Again, you can write the new values direct to the proc interface without 
rebooting, for experimentation.  Once you find values appropriate for 
you, however, write them to sysctl.conf or whatever your distro uses 
instead, so they get applied automatically at each boot.

---
[1] Sysadmins:  Like me, no claim to dev here, nor am I a professional 
sysadmin, but arguably I do take the responsibility of adminning my own 
systems more seriously than most appear to, enough to claim sysadmin as 
an appropriate descriptor.

[2] Sysctl config.  Look in /etc/sysctl.d/* and/or /etc/sysctl.conf, as 
appropriate to your distro.

[3] Approaches: The memory figure used for calculating this percentage 
excludes some things so it won't actually reach 10% of total memory.  But 
the exclusions are small enough that they can be hand-waved away for 
purposes of this discussion.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
  2016-12-13 17:11 page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another David Arendt
  2016-12-13 19:55 ` Xin Zhou
@ 2016-12-14 10:17 ` Michal Hocko
  2016-12-14 10:30   ` Michal Hocko
  1 sibling, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2016-12-14 10:17 UTC (permalink / raw)
  To: David Arendt; +Cc: linux-btrfs, linux-kernel, David Sterba, Chris Mason

On Tue 13-12-16 18:11:01, David Arendt wrote:
> Hi,
> 
> I receive the following page allocation stall while copying lots of
> large files from one btrfs hdd to another.
> 
> Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
> Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8 Tainted: P           O    4.9.0 #1
[...]
> Dec 13 13:04:29 server kernel: Call Trace:
> Dec 13 13:04:29 server kernel:  [<ffffffff813f3a59>] ? dump_stack+0x46/0x5d
> Dec 13 13:04:29 server kernel:  [<ffffffff81114fc1>] ? warn_alloc+0x111/0x130
> Dec 13 13:04:33 server kernel:  [<ffffffff81115c38>] ? __alloc_pages_nodemask+0xbe8/0xd30
> Dec 13 13:04:33 server kernel:  [<ffffffff8110de74>] ? pagecache_get_page+0xe4/0x230
> Dec 13 13:04:33 server kernel:  [<ffffffff81323a8b>] ? alloc_extent_buffer+0x10b/0x400
> Dec 13 13:04:33 server kernel:  [<ffffffff812ef8c5>] ? btrfs_alloc_tree_block+0x125/0x560

OK, so this is
	find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL)

The main question is whether this really needs to be NOFS request...

> Dec 13 13:04:33 server kernel:  [<ffffffff8132442f>] ? read_extent_buffer_pages+0x21f/0x280
> Dec 13 13:04:33 server kernel:  [<ffffffff812d81f1>] ? __btrfs_cow_block+0x141/0x580
> Dec 13 13:04:33 server kernel:  [<ffffffff812d87b0>] ? btrfs_cow_block+0x100/0x150
> Dec 13 13:04:33 server kernel:  [<ffffffff812dc1d9>] ?  btrfs_search_slot+0x1e9/0x9c0
> Dec 13 13:04:33 server kernel:  [<ffffffff8131ead2>] ? __set_extent_bit+0x512/0x550
> Dec 13 13:04:33 server kernel:  [<ffffffff812e1ab5>] ? lookup_inline_extent_backref+0xf5/0x5e0
> Dec 13 13:04:34 server kernel:  [<ffffffff8131f0a4>] ? set_extent_bit+0x24/0x30
> Dec 13 13:04:34 server kernel:  [<ffffffff812e4334>] ? update_block_group.isra.34+0x114/0x380
> Dec 13 13:04:34 server kernel:  [<ffffffff812e4694>] ? __btrfs_free_extent.isra.35+0xf4/0xd20
> Dec 13 13:04:34 server kernel:  [<ffffffff8134d561>] ? btrfs_merge_delayed_refs+0x61/0x5d0
> Dec 13 13:04:34 server kernel:  [<ffffffff812e8bd2>] ? __btrfs_run_delayed_refs+0x902/0x10a0
> Dec 13 13:04:34 server kernel:  [<ffffffff812ec0f0>] ? btrfs_run_delayed_refs+0x90/0x2a0
> Dec 13 13:04:34 server kernel:  [<ffffffff812ec384>] ? delayed_ref_async_start+0x84/0xa0

What would cause the reclaim recursion?

> Dec 13 13:04:34 server kernel: Mem-Info:
> Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34
> isolated_anon:0\x0a active_file:7370032 inactive_file:450105
> isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189
> unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a
> mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241
> free_cma:0

This speaks for itself. There is a lot of dirty data, basically no
anonymous memory and GFP_NOFS cannot do much to reclaim obviously. This
is either a configuraion bug as somebody noted down the thread (setting
the dirty_ratio) or suboptimality of the btrfs code which might request
NOFS even though it is not strictly necessary. This would be more for
btrfs developers.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
  2016-12-14 10:17 ` Michal Hocko
@ 2016-12-14 10:30   ` Michal Hocko
  2016-12-14 11:38     ` admin
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2016-12-14 10:30 UTC (permalink / raw)
  To: David Arendt; +Cc: linux-btrfs, linux-kernel, David Sterba, Chris Mason

Btw. the stall should be preceded by the OOM killer invocation. Could
you share the OOM report please. I am asking because such an OOM killer
would be clearly pre-mature as per your meminfo. I am trying to change
that code and seeing your numbers might help me.

Thanks!

On Wed 14-12-16 11:17:43, Michal Hocko wrote:
> On Tue 13-12-16 18:11:01, David Arendt wrote:
> > Hi,
> > 
> > I receive the following page allocation stall while copying lots of
> > large files from one btrfs hdd to another.
> > 
> > Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
> > Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8 Tainted: P           O    4.9.0 #1
> [...]
> > Dec 13 13:04:29 server kernel: Call Trace:
> > Dec 13 13:04:29 server kernel:  [<ffffffff813f3a59>] ? dump_stack+0x46/0x5d
> > Dec 13 13:04:29 server kernel:  [<ffffffff81114fc1>] ? warn_alloc+0x111/0x130
> > Dec 13 13:04:33 server kernel:  [<ffffffff81115c38>] ? __alloc_pages_nodemask+0xbe8/0xd30
> > Dec 13 13:04:33 server kernel:  [<ffffffff8110de74>] ? pagecache_get_page+0xe4/0x230
> > Dec 13 13:04:33 server kernel:  [<ffffffff81323a8b>] ? alloc_extent_buffer+0x10b/0x400
> > Dec 13 13:04:33 server kernel:  [<ffffffff812ef8c5>] ? btrfs_alloc_tree_block+0x125/0x560
> 
> OK, so this is
> 	find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL)
> 
> The main question is whether this really needs to be NOFS request...
> 
> > Dec 13 13:04:33 server kernel:  [<ffffffff8132442f>] ? read_extent_buffer_pages+0x21f/0x280
> > Dec 13 13:04:33 server kernel:  [<ffffffff812d81f1>] ? __btrfs_cow_block+0x141/0x580
> > Dec 13 13:04:33 server kernel:  [<ffffffff812d87b0>] ? btrfs_cow_block+0x100/0x150
> > Dec 13 13:04:33 server kernel:  [<ffffffff812dc1d9>] ?  btrfs_search_slot+0x1e9/0x9c0
> > Dec 13 13:04:33 server kernel:  [<ffffffff8131ead2>] ? __set_extent_bit+0x512/0x550
> > Dec 13 13:04:33 server kernel:  [<ffffffff812e1ab5>] ? lookup_inline_extent_backref+0xf5/0x5e0
> > Dec 13 13:04:34 server kernel:  [<ffffffff8131f0a4>] ? set_extent_bit+0x24/0x30
> > Dec 13 13:04:34 server kernel:  [<ffffffff812e4334>] ? update_block_group.isra.34+0x114/0x380
> > Dec 13 13:04:34 server kernel:  [<ffffffff812e4694>] ? __btrfs_free_extent.isra.35+0xf4/0xd20
> > Dec 13 13:04:34 server kernel:  [<ffffffff8134d561>] ? btrfs_merge_delayed_refs+0x61/0x5d0
> > Dec 13 13:04:34 server kernel:  [<ffffffff812e8bd2>] ? __btrfs_run_delayed_refs+0x902/0x10a0
> > Dec 13 13:04:34 server kernel:  [<ffffffff812ec0f0>] ? btrfs_run_delayed_refs+0x90/0x2a0
> > Dec 13 13:04:34 server kernel:  [<ffffffff812ec384>] ? delayed_ref_async_start+0x84/0xa0
> 
> What would cause the reclaim recursion?
> 
> > Dec 13 13:04:34 server kernel: Mem-Info:
> > Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34
> > isolated_anon:0\x0a active_file:7370032 inactive_file:450105
> > isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189
> > unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a
> > mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241
> > free_cma:0
> 
> This speaks for itself. There is a lot of dirty data, basically no
> anonymous memory and GFP_NOFS cannot do much to reclaim obviously. This
> is either a configuraion bug as somebody noted down the thread (setting
> the dirty_ratio) or suboptimality of the btrfs code which might request
> NOFS even though it is not strictly necessary. This would be more for
> btrfs developers.
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
  2016-12-14 10:30   ` Michal Hocko
@ 2016-12-14 11:38     ` admin
  2016-12-14 23:23       ` Xin Zhou
  0 siblings, 1 reply; 9+ messages in thread
From: admin @ 2016-12-14 11:38 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-btrfs, linux-kernel, David Sterba, Chris Mason

Hi,

I verified the log files and see no prior oom killer invocation. Unfortunately the machine has been rebooted since. Next time it happens, I will also look in dmesg.

Thanks,
David Arendt 


Michal Hocko – Wed., 14. December 2016 11:31
> Btw. the stall should be preceded by the OOM killer invocation. Could
> you share the OOM report please. I am asking because such an OOM killer
> would be clearly pre-mature as per your meminfo. I am trying to change
> that code and seeing your numbers might help me.
> 
> Thanks!
> 
> On Wed 14-12-16 11:17:43, Michal Hocko wrote:
> > On Tue 13-12-16 18:11:01, David Arendt wrote:
> > > Hi,
> > > 
> > > I receive the following page allocation stall while copying lots of
> > > large files from one btrfs hdd to another.
> > > 
> > > Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
> > > Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8 Tainted: P           O    4.9.0 #1
> > [...]
> > > Dec 13 13:04:29 server kernel: Call Trace:
> > > Dec 13 13:04:29 server kernel:  [<ffffffff813f3a59>] ? dump_stack+0x46/0x5d
> > > Dec 13 13:04:29 server kernel:  [<ffffffff81114fc1>] ? warn_alloc+0x111/0x130
> > > Dec 13 13:04:33 server kernel:  [<ffffffff81115c38>] ? __alloc_pages_nodemask+0xbe8/0xd30
> > > Dec 13 13:04:33 server kernel:  [<ffffffff8110de74>] ? pagecache_get_page+0xe4/0x230
> > > Dec 13 13:04:33 server kernel:  [<ffffffff81323a8b>] ? alloc_extent_buffer+0x10b/0x400
> > > Dec 13 13:04:33 server kernel:  [<ffffffff812ef8c5>] ? btrfs_alloc_tree_block+0x125/0x560
> > 
> > OK, so this is
> > 	find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL)
> > 
> > The main question is whether this really needs to be NOFS request...
> > 
> > > Dec 13 13:04:33 server kernel:  [<ffffffff8132442f>] ? read_extent_buffer_pages+0x21f/0x280
> > > Dec 13 13:04:33 server kernel:  [<ffffffff812d81f1>] ? __btrfs_cow_block+0x141/0x580
> > > Dec 13 13:04:33 server kernel:  [<ffffffff812d87b0>] ? btrfs_cow_block+0x100/0x150
> > > Dec 13 13:04:33 server kernel:  [<ffffffff812dc1d9>] ?  btrfs_search_slot+0x1e9/0x9c0
> > > Dec 13 13:04:33 server kernel:  [<ffffffff8131ead2>] ? __set_extent_bit+0x512/0x550
> > > Dec 13 13:04:33 server kernel:  [<ffffffff812e1ab5>] ? lookup_inline_extent_backref+0xf5/0x5e0
> > > Dec 13 13:04:34 server kernel:  [<ffffffff8131f0a4>] ? set_extent_bit+0x24/0x30
> > > Dec 13 13:04:34 server kernel:  [<ffffffff812e4334>] ? update_block_group.isra.34+0x114/0x380
> > > Dec 13 13:04:34 server kernel:  [<ffffffff812e4694>] ? __btrfs_free_extent.isra.35+0xf4/0xd20
> > > Dec 13 13:04:34 server kernel:  [<ffffffff8134d561>] ? btrfs_merge_delayed_refs+0x61/0x5d0
> > > Dec 13 13:04:34 server kernel:  [<ffffffff812e8bd2>] ? __btrfs_run_delayed_refs+0x902/0x10a0
> > > Dec 13 13:04:34 server kernel:  [<ffffffff812ec0f0>] ? btrfs_run_delayed_refs+0x90/0x2a0
> > > Dec 13 13:04:34 server kernel:  [<ffffffff812ec384>] ? delayed_ref_async_start+0x84/0xa0
> > 
> > What would cause the reclaim recursion?
> > 
> > > Dec 13 13:04:34 server kernel: Mem-Info:
> > > Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34
> > > isolated_anon:0\x0a active_file:7370032 inactive_file:450105
> > > isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189
> > > unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a
> > > mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241
> > > free_cma:0
> > 
> > This speaks for itself. There is a lot of dirty data, basically no
> > anonymous memory and GFP_NOFS cannot do much to reclaim obviously. This
> > is either a configuraion bug as somebody noted down the thread (setting
> > the dirty_ratio) or suboptimality of the btrfs code which might request
> > NOFS even though it is not strictly necessary. This would be more for
> > btrfs developers.
> > -- 
> > Michal Hocko
> > SUSE Labs
> 
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
  2016-12-14 11:38     ` admin
@ 2016-12-14 23:23       ` Xin Zhou
  2016-12-15 11:52         ` admin
  0 siblings, 1 reply; 9+ messages in thread
From: Xin Zhou @ 2016-12-14 23:23 UTC (permalink / raw)
  To: admin; +Cc: Michal Hocko, linux-btrfs, linux-kernel, David Sterba, Chris Mason

Hi,

The dirty data is in large amount, probably unable to commit to disk.
And this seems to happen when copying from 7200rpm to 5600rpm disks, according to previous post.

Probably the I/Os are buffered and pending, unable to get finished in-time.
It might be helpful to know if this only happens for specific types of 5600 rpm disks?

And are these disks on RAID groups? Thanks.
Xin
 
 

Sent: Wednesday, December 14, 2016 at 3:38 AM
From: admin <admin@prnet.org>
To: "Michal Hocko" <mhocko@kernel.org>
Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, "David Sterba" <dsterba@suse.cz>, "Chris Mason" <clm@fb.com>
Subject: Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
Hi,

I verified the log files and see no prior oom killer invocation. Unfortunately the machine has been rebooted since. Next time it happens, I will also look in dmesg.

Thanks,
David Arendt


Michal Hocko – Wed., 14. December 2016 11:31
> Btw. the stall should be preceded by the OOM killer invocation. Could
> you share the OOM report please. I am asking because such an OOM killer
> would be clearly pre-mature as per your meminfo. I am trying to change
> that code and seeing your numbers might help me.
>
> Thanks!
>
> On Wed 14-12-16 11:17:43, Michal Hocko wrote:
> > On Tue 13-12-16 18:11:01, David Arendt wrote:
> > > Hi,
> > >
> > > I receive the following page allocation stall while copying lots of
> > > large files from one btrfs hdd to another.
> > >
> > > Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
> > > Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8 Tainted: P O 4.9.0 #1
> > [...]
> > > Dec 13 13:04:29 server kernel: Call Trace:
> > > Dec 13 13:04:29 server kernel: [<ffffffff813f3a59>] ? dump_stack+0x46/0x5d
> > > Dec 13 13:04:29 server kernel: [<ffffffff81114fc1>] ? warn_alloc+0x111/0x130
> > > Dec 13 13:04:33 server kernel: [<ffffffff81115c38>] ? __alloc_pages_nodemask+0xbe8/0xd30
> > > Dec 13 13:04:33 server kernel: [<ffffffff8110de74>] ? pagecache_get_page+0xe4/0x230
> > > Dec 13 13:04:33 server kernel: [<ffffffff81323a8b>] ? alloc_extent_buffer+0x10b/0x400
> > > Dec 13 13:04:33 server kernel: [<ffffffff812ef8c5>] ? btrfs_alloc_tree_block+0x125/0x560
> >
> > OK, so this is
> > find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL)
> >
> > The main question is whether this really needs to be NOFS request...
> >
> > > Dec 13 13:04:33 server kernel: [<ffffffff8132442f>] ? read_extent_buffer_pages+0x21f/0x280
> > > Dec 13 13:04:33 server kernel: [<ffffffff812d81f1>] ? __btrfs_cow_block+0x141/0x580
> > > Dec 13 13:04:33 server kernel: [<ffffffff812d87b0>] ? btrfs_cow_block+0x100/0x150
> > > Dec 13 13:04:33 server kernel: [<ffffffff812dc1d9>] ? btrfs_search_slot+0x1e9/0x9c0
> > > Dec 13 13:04:33 server kernel: [<ffffffff8131ead2>] ? __set_extent_bit+0x512/0x550
> > > Dec 13 13:04:33 server kernel: [<ffffffff812e1ab5>] ? lookup_inline_extent_backref+0xf5/0x5e0
> > > Dec 13 13:04:34 server kernel: [<ffffffff8131f0a4>] ? set_extent_bit+0x24/0x30
> > > Dec 13 13:04:34 server kernel: [<ffffffff812e4334>] ? update_block_group.isra.34+0x114/0x380
> > > Dec 13 13:04:34 server kernel: [<ffffffff812e4694>] ? __btrfs_free_extent.isra.35+0xf4/0xd20
> > > Dec 13 13:04:34 server kernel: [<ffffffff8134d561>] ? btrfs_merge_delayed_refs+0x61/0x5d0
> > > Dec 13 13:04:34 server kernel: [<ffffffff812e8bd2>] ? __btrfs_run_delayed_refs+0x902/0x10a0
> > > Dec 13 13:04:34 server kernel: [<ffffffff812ec0f0>] ? btrfs_run_delayed_refs+0x90/0x2a0
> > > Dec 13 13:04:34 server kernel: [<ffffffff812ec384>] ? delayed_ref_async_start+0x84/0xa0
> >
> > What would cause the reclaim recursion?
> >
> > > Dec 13 13:04:34 server kernel: Mem-Info:
> > > Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34
> > > isolated_anon:0\x0a active_file:7370032 inactive_file:450105
> > > isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189
> > > unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a
> > > mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241
> > > free_cma:0
> >
> > This speaks for itself. There is a lot of dirty data, basically no
> > anonymous memory and GFP_NOFS cannot do much to reclaim obviously. This
> > is either a configuraion bug as somebody noted down the thread (setting
> > the dirty_ratio) or suboptimality of the btrfs code which might request
> > NOFS even though it is not strictly necessary. This would be more for
> > btrfs developers.
> > --
> > Michal Hocko
> > SUSE Labs
>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
  2016-12-14 23:23       ` Xin Zhou
@ 2016-12-15 11:52         ` admin
  0 siblings, 0 replies; 9+ messages in thread
From: admin @ 2016-12-15 11:52 UTC (permalink / raw)
  To: Xin Zhou
  Cc: Michal Hocko, linux-btrfs, linux-kernel, David Sterba, Chris Mason

Hi,

The source is a software raid 5 (md) of 4x 4TB Western Digital RE4 disks. The destinations is a hardware raid 5 enclosure containing 4x 8TB Seagate Archival disks connected using e-sata. 

I am currently trying Duncans suggestions. With them, the page allocation stall doesn't seem to appear and overall system responsiveness is also much better during copying.

Thanks,
David Arendt


Xin Zhou – Thu., 15. December 2016 0:24
> Hi,
> 
> The dirty data is in large amount, probably unable to commit to disk.
> And this seems to happen when copying from 7200rpm to 5600rpm disks, according to previous post.
> 
> Probably the I/Os are buffered and pending, unable to get finished in-time.
> It might be helpful to know if this only happens for specific types of 5600 rpm disks?
> 
> And are these disks on RAID groups? Thanks.
> Xin
>  
>  
> 
> Sent: Wednesday, December 14, 2016 at 3:38 AM
> From: admin <admin@prnet.org>
> To: "Michal Hocko" <mhocko@kernel.org>
> Cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, "David Sterba" <dsterba@suse.cz>, "Chris Mason" <clm@fb.com>
> Subject: Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
> Hi,
> 
> I verified the log files and see no prior oom killer invocation. Unfortunately the machine has been rebooted since. Next time it happens, I will also look in dmesg.
> 
> Thanks,
> David Arendt
> 
> 
> Michal Hocko – Wed., 14. December 2016 11:31
> > Btw. the stall should be preceded by the OOM killer invocation. Could
> > you share the OOM report please. I am asking because such an OOM killer
> > would be clearly pre-mature as per your meminfo. I am trying to change
> > that code and seeing your numbers might help me.
> >
> > Thanks!
> >
> > On Wed 14-12-16 11:17:43, Michal Hocko wrote:
> > > On Tue 13-12-16 18:11:01, David Arendt wrote:
> > > > Hi,
> > > >
> > > > I receive the following page allocation stall while copying lots of
> > > > large files from one btrfs hdd to another.
> > > >
> > > > Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
> > > > Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8 Tainted: P O 4.9.0 #1
> > > [...]
> > > > Dec 13 13:04:29 server kernel: Call Trace:
> > > > Dec 13 13:04:29 server kernel: [<ffffffff813f3a59>] ? dump_stack+0x46/0x5d
> > > > Dec 13 13:04:29 server kernel: [<ffffffff81114fc1>] ? warn_alloc+0x111/0x130
> > > > Dec 13 13:04:33 server kernel: [<ffffffff81115c38>] ? __alloc_pages_nodemask+0xbe8/0xd30
> > > > Dec 13 13:04:33 server kernel: [<ffffffff8110de74>] ? pagecache_get_page+0xe4/0x230
> > > > Dec 13 13:04:33 server kernel: [<ffffffff81323a8b>] ? alloc_extent_buffer+0x10b/0x400
> > > > Dec 13 13:04:33 server kernel: [<ffffffff812ef8c5>] ? btrfs_alloc_tree_block+0x125/0x560
> > >
> > > OK, so this is
> > > find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL)
> > >
> > > The main question is whether this really needs to be NOFS request...
> > >
> > > > Dec 13 13:04:33 server kernel: [<ffffffff8132442f>] ? read_extent_buffer_pages+0x21f/0x280
> > > > Dec 13 13:04:33 server kernel: [<ffffffff812d81f1>] ? __btrfs_cow_block+0x141/0x580
> > > > Dec 13 13:04:33 server kernel: [<ffffffff812d87b0>] ? btrfs_cow_block+0x100/0x150
> > > > Dec 13 13:04:33 server kernel: [<ffffffff812dc1d9>] ? btrfs_search_slot+0x1e9/0x9c0
> > > > Dec 13 13:04:33 server kernel: [<ffffffff8131ead2>] ? __set_extent_bit+0x512/0x550
> > > > Dec 13 13:04:33 server kernel: [<ffffffff812e1ab5>] ? lookup_inline_extent_backref+0xf5/0x5e0
> > > > Dec 13 13:04:34 server kernel: [<ffffffff8131f0a4>] ? set_extent_bit+0x24/0x30
> > > > Dec 13 13:04:34 server kernel: [<ffffffff812e4334>] ? update_block_group.isra.34+0x114/0x380
> > > > Dec 13 13:04:34 server kernel: [<ffffffff812e4694>] ? __btrfs_free_extent.isra.35+0xf4/0xd20
> > > > Dec 13 13:04:34 server kernel: [<ffffffff8134d561>] ? btrfs_merge_delayed_refs+0x61/0x5d0
> > > > Dec 13 13:04:34 server kernel: [<ffffffff812e8bd2>] ? __btrfs_run_delayed_refs+0x902/0x10a0
> > > > Dec 13 13:04:34 server kernel: [<ffffffff812ec0f0>] ? btrfs_run_delayed_refs+0x90/0x2a0
> > > > Dec 13 13:04:34 server kernel: [<ffffffff812ec384>] ? delayed_ref_async_start+0x84/0xa0
> > >
> > > What would cause the reclaim recursion?
> > >
> > > > Dec 13 13:04:34 server kernel: Mem-Info:
> > > > Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34
> > > > isolated_anon:0\x0a active_file:7370032 inactive_file:450105
> > > > isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189
> > > > unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a
> > > > mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241
> > > > free_cma:0
> > >
> > > This speaks for itself. There is a lot of dirty data, basically no
> > > anonymous memory and GFP_NOFS cannot do much to reclaim obviously. This
> > > is either a configuraion bug as somebody noted down the thread (setting
> > > the dirty_ratio) or suboptimality of the btrfs code which might request
> > > NOFS even though it is not strictly necessary. This would be more for
> > > btrfs developers.
> > > --
> > > Michal Hocko
> > > SUSE Labs
> >
> > --
> > Michal Hocko
> > SUSE Labs
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-12-15 11:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-13 17:11 page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another David Arendt
2016-12-13 19:55 ` Xin Zhou
2016-12-13 20:26   ` David Arendt
2016-12-13 23:28     ` Duncan
2016-12-14 10:17 ` Michal Hocko
2016-12-14 10:30   ` Michal Hocko
2016-12-14 11:38     ` admin
2016-12-14 23:23       ` Xin Zhou
2016-12-15 11:52         ` admin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).