* rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS @ 2017-02-27 21:36 Robert Kudyba 2017-02-28 14:15 ` Michal Hocko 0 siblings, 1 reply; 10+ messages in thread From: Robert Kudyba @ 2017-02-27 21:36 UTC (permalink / raw) To: linux-kernel This sounds very similar to David Arendt’s problem from December 2016 on 12/13 post numbered 529 in the URL with the title page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another This happens about 2 hours after our daily rsnapshot/rsync cron job starts pretty much every time. And I set these parameters in /etc/sysctl.conf on a Fedora 25 server: net.ipv4.tcp_window_scaling=0 net.ipv4.tcp_timestamps=0 net.ipv4.tcp_tw_recycle=0 vm.zone_reclaim_mode=1 vm.min_free_kbytes=131072 # write-cache, foreground/background flushing vm.dirty_ratio = 3 # vm.dirty_background_ratio = 5 (% of RAM) vm.dirty_background_ratio = 1 # make it 10 sec vm.dirty_writeback_centisecs = 1000 Feb 27 04:36:54 curie kernel: rsync: page allocation stalls for 10699ms, order:0, mode:0x2420848(GFP_NOFS|__GFP_NOFAIL|__GFP_HARDWALL|__GFP_MOVABLE) Feb 27 04:36:54 curie kernel: CPU: 2 PID: 32649 Comm: rsync Tainted: G L 4.9.10-200.fc25.i686+PAE #1 Feb 27 04:36:54 curie kernel: Hardware name: Dell Computer Corporation PowerEdge 1850/0D8266, BIOS A07 04/25/2008 Feb 27 04:36:54 curie kernel: ec725bac cd76bd10 cdd774bc 00000000 ec725bdc cd584b96 cdd7bc69 02420848 Feb 27 04:36:54 curie kernel: ec725be4 ec725bec cdd774bc ec725bc0 a0988a17 cdef9c48 ffffd635 00000000 Feb 27 04:36:54 curie kernel: ec725ca4 cd585686 02420848 cdd774bc 000029cb 00000000 00000010 00000050 Feb 27 04:36:54 curie kernel: Call Trace: Feb 27 04:36:54 curie kernel: [<cd76bd10>] dump_stack+0x58/0x78 Feb 27 04:36:54 curie kernel: [<cd584b96>] warn_alloc+0xf6/0x110 Feb 27 04:36:54 curie kernel: [<cd585686>] __alloc_pages_nodemask+0xa66/0xd30 Feb 27 04:36:54 curie kernel: [<cd57c212>] ? find_get_entry+0x22/0x160 Feb 27 04:36:54 curie kernel: [<cd57cb92>] pagecache_get_page+0xc2/0x2d0 Feb 27 04:36:54 curie kernel: [<cd61f3a9>] __getblk_gfp+0x109/0x330 Feb 27 04:36:54 curie kernel: [<cd673a73>] ext4_getblk+0xa3/0x1b0 Feb 27 04:36:54 curie kernel: [<cd673ba3>] ext4_bread+0x23/0xb0 Feb 27 04:36:54 curie kernel: [<cd67dc97>] __ext4_read_dirblock+0x27/0x3a0 Feb 27 04:36:54 curie kernel: [<cd5f50f6>] ? lookup_fast+0x46/0x2d0 Feb 27 04:36:54 curie kernel: [<cd67e652>] htree_dirblock_to_tree+0x52/0x260 Feb 27 04:36:54 curie kernel: [<cd60a690>] ? mntput+0x20/0x40 Feb 27 04:36:54 curie kernel: [<cd5f431f>] ? terminate_walk+0x8f/0x100 Feb 27 04:36:54 curie kernel: [<cd67f730>] ext4_htree_fill_tree+0x90/0x2e0 Feb 27 04:36:54 curie kernel: [<cd5f9aa4>] ? putname+0x54/0x60 Feb 27 04:36:54 curie kernel: [<cd66b9d1>] ? ext4_readdir+0x9d1/0xa60 Feb 27 04:36:54 curie kernel: [<cd66b9d1>] ? ext4_readdir+0x9d1/0xa60 Feb 27 04:36:54 curie kernel: [<cd66ab46>] ? free_rb_tree_fname+0x16/0x70 Feb 27 04:36:54 curie kernel: [<cd66b7a4>] ext4_readdir+0x7a4/0xa60 Feb 27 04:36:54 curie kernel: [<cd77a090>] ? _copy_to_user+0x40/0x50 Feb 27 04:36:54 curie kernel: [<cd6ee4bf>] ? security_file_permission+0x9f/0xc0 Feb 27 04:36:54 curie kernel: [<cd5fe259>] iterate_dir+0x179/0x1a0 Feb 27 04:36:54 curie kernel: [<cd5f00d0>] ? SyS_fstat64+0x30/0x50 Feb 27 04:36:54 curie kernel: [<cd5fe83b>] SyS_getdents64+0x7b/0x100 Feb 27 04:36:54 curie kernel: [<cd5fe280>] ? iterate_dir+0x1a0/0x1a0 Feb 27 04:36:54 curie kernel: [<cd40378a>] do_fast_syscall_32+0x8a/0x150 Feb 27 04:36:54 curie kernel: [<cdb723ca>] sysenter_past_esp+0x47/0x75 Feb 27 04:36:54 curie kernel: Mem-Info: Feb 27 04:36:55 curie kernel: active_anon:2091 inactive_anon:1298 isolated_anon:0#012 active_file:268004 inactive_file:1386276 isolated_file:0#012 unevictable:1986 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:144934 slab_unreclaimable:7392#012 mapped:14331 shmem:170 pagetables:460 bounce:0#012 free:252753 free_pcp:470 free_cma:0 Feb 27 04:36:55 curie kernel: Node 0 active_anon:8364kB inactive_anon:5192kB active_file:1072016kB inactive_file:5545104kB unevictable:7944kB isolated(anon):0kB isolated(file):0kB mapped:57324kB dirty:0kB writeback:0kB shmem:680kB writeback_tmp:0kB unstable:0kB pages_scanned:171672794 all_unreclaimable? yes Feb 27 04:36:55 curie kernel: DMA free:5584kB min:2580kB low:3224kB high:3868kB active_anon:0kB inactive_anon:0kB active_file:148kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15916kB mlocked:0kB slab_reclaimable:10032kB slab_unreclaimable:120kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Feb 27 04:36:55 curie kernel: lowmem_reserve[]: 0 751 8055 8055 Feb 27 04:36:55 curie kernel: Normal free:128408kB min:128488kB low:160608kB high:192728kB active_anon:0kB inactive_anon:0kB active_file:44340kB inactive_file:4720kB unevictable:0kB writepending:0kB present:892920kB managed:791608kB mlocked:0kB slab_reclaimable:569704kB slab_unreclaimable:29448kB kernel_stack:1168kB pagetables:0kB bounce:0kB free_pcp:1568kB local_pcp:236kB free_cma:0kB Feb 27 04:36:55 curie kernel: lowmem_reserve[]: 0 0 58430 58430 Feb 27 04:36:55 curie kernel: HighMem free:877020kB min:512kB low:303996kB high:607480kB active_anon:8364kB inactive_anon:5192kB active_file:1027528kB inactive_file:5540384kB unevictable:7944kB writepending:0kB present:7479040kB managed:7479040kB mlocked:7944kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:1840kB bounce:0kB free_pcp:312kB local_pcp:196kB free_cma:0kB Feb 27 04:36:55 curie kernel: lowmem_reserve[]: 0 0 0 0 Feb 27 04:36:55 curie kernel: DMA: 16*4kB (ME) 26*8kB (UME) 8*16kB (UE) 2*32kB (ME) 2*64kB (UM) 1*128kB (U) 3*256kB (UME) 2*512kB (UM) 3*1024kB (UME) 0*2048kB 0*4096kB = 5584kB Feb 27 04:36:55 curie kernel: Normal: 556*4kB (UME) 1269*8kB (UEH) 568*16kB (UE) 362*32kB (UME) 116*64kB (UMEH) 23*128kB (UMEH) 66*256kB (MEH) 69*512kB (ME) 22*1024kB (ME) 5*2048kB (M) 0*4096kB = 128408kB Feb 27 04:36:55 curie kernel: HighMem: 6301*4kB (UM) 3861*8kB (UM) 1616*16kB (UM) 1468*32kB (UM) 247*64kB (UM) 237*128kB (UM) 462*256kB (UM) 476*512kB (UM) 228*1024kB (UM) 6*2048kB (UM) 23*4096kB (UM) = 877020kB Feb 27 04:36:55 curie kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Feb 27 04:36:55 curie kernel: 1656105 total pagecache pages Feb 27 04:36:55 curie kernel: 81 pages in swap cache Feb 27 04:36:55 curie kernel: Swap cache stats: add 6469, delete 6388, find 28828/29500 Feb 27 04:36:55 curie kernel: Free swap = 8352872kB Feb 27 04:36:55 curie kernel: Total swap = 8355836kB Feb 27 04:36:55 curie kernel: 2096989 pages RAM Feb 27 04:36:55 curie kernel: 1869760 pages HighMem/MovableOnly Feb 27 04:36:55 curie kernel: 25348 pages reserved Feb 27 04:36:55 curie kernel: 0 pages hwpoisoned ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS 2017-02-27 21:36 rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS Robert Kudyba @ 2017-02-28 14:15 ` Michal Hocko [not found] ` <40F07E96-7468-4355-B8EA-4B42F575ACAB@fordham.edu> 0 siblings, 1 reply; 10+ messages in thread From: Michal Hocko @ 2017-02-28 14:15 UTC (permalink / raw) To: Robert Kudyba; +Cc: linux-kernel On Mon 27-02-17 16:36:43, Robert Kudyba wrote: > Feb 27 04:36:54 curie kernel: rsync: page allocation stalls for 10699ms, order:0, mode:0x2420848(GFP_NOFS|__GFP_NOFAIL|__GFP_HARDWALL|__GFP_MOVABLE) > Feb 27 04:36:54 curie kernel: CPU: 2 PID: 32649 Comm: rsync Tainted: G L 4.9.10-200.fc25.i686+PAE #1 [...] This is a lowmem request (aka only zones DMA and Normal can be used). > Feb 27 04:36:55 curie kernel: DMA free:5584kB min:2580kB low:3224kB high:3868kB active_anon:0kB inactive_anon:0kB active_file:148kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15916kB mlocked:0kB slab_reclaimable:10032kB slab_unreclaimable:120kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > Feb 27 04:36:55 curie kernel: lowmem_reserve[]: 0 751 8055 8055 this one is protected by the lowmem_reserve > Feb 27 04:36:55 curie kernel: Normal free:128408kB min:128488kB low:160608kB high:192728kB active_anon:0kB inactive_anon:0kB active_file:44340kB inactive_file:4720kB unevictable:0kB writepending:0kB present:892920kB managed:791608kB mlocked:0kB slab_reclaimable:569704kB slab_unreclaimable:29448kB kernel_stack:1168kB pagetables:0kB bounce:0kB free_pcp:1568kB local_pcp:236kB free_cma:0kB and this one is hitting the min watermark while there is not really much to reclaim. Only the page cache which might be pinned and not reclaimable from this context because this is GFP_NOFS request. It is not all that surprising the reclaim context fights to get some memory. There is a huge amount of the reclaimable slab which probably just makes a slow progress. That is not something completely surprsing on 32b system I am afraid. Btw. is the stall repeating with the increased time or it gets resolved eventually? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <40F07E96-7468-4355-B8EA-4B42F575ACAB@fordham.edu>]
* Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS [not found] ` <40F07E96-7468-4355-B8EA-4B42F575ACAB@fordham.edu> @ 2017-02-28 14:40 ` Michal Hocko [not found] ` <3E4C7821-A93D-4956-A0E0-730BEC67C9F0@fordham.edu> 0 siblings, 1 reply; 10+ messages in thread From: Michal Hocko @ 2017-02-28 14:40 UTC (permalink / raw) To: Robert Kudyba; +Cc: linux-kernel On Tue 28-02-17 09:33:49, Robert Kudyba wrote: > > > On Feb 28, 2017, at 9:15 AM, Michal Hocko <mhocko@kernel.org> wrote: > > and this one is hitting the min watermark while there is not really > > much to reclaim. Only the page cache which might be pinned and not > > reclaimable from this context because this is GFP_NOFS request. It is > > not all that surprising the reclaim context fights to get some memory. > > There is a huge amount of the reclaimable slab which probably just makes > > a slow progress. > > > > That is not something completely surprsing on 32b system I am afraid. > > > > Btw. is the stall repeating with the increased time or it gets resolved > > eventually? > > Yes and if you mean by repeating it’s not only affecting rsync but > you can see just now automount and NetworkManager get these page > allocation stalls and kswapd0 is getting heavy CPU load, are there any > other settings I can adjust? None that I am aware of. You might want to talk to FS guys, maybe they can figure out who is pinning file pages so that they cannot be reclaimed. They do not seem to be dirty or under writeback. It would be also interesting to see whether that is a regression. The warning is relatively new so you might have had this problem before just haven't noticed it. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <3E4C7821-A93D-4956-A0E0-730BEC67C9F0@fordham.edu>]
* Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS [not found] ` <3E4C7821-A93D-4956-A0E0-730BEC67C9F0@fordham.edu> @ 2017-02-28 15:15 ` Michal Hocko 2017-02-28 16:19 ` Robert Kudyba 0 siblings, 1 reply; 10+ messages in thread From: Michal Hocko @ 2017-02-28 15:15 UTC (permalink / raw) To: Robert Kudyba; +Cc: linux-kernel On Tue 28-02-17 09:59:35, Robert Kudyba wrote: > > > On Feb 28, 2017, at 9:40 AM, Michal Hocko <mhocko@kernel.org> wrote: > > > > On Tue 28-02-17 09:33:49, Robert Kudyba wrote: > >> > >>> On Feb 28, 2017, at 9:15 AM, Michal Hocko <mhocko@kernel.org> wrote: > >>> and this one is hitting the min watermark while there is not really > >>> much to reclaim. Only the page cache which might be pinned and not > >>> reclaimable from this context because this is GFP_NOFS request. It is > >>> not all that surprising the reclaim context fights to get some memory. > >>> There is a huge amount of the reclaimable slab which probably just makes > >>> a slow progress. > >>> > >>> That is not something completely surprsing on 32b system I am afraid. > >>> > >>> Btw. is the stall repeating with the increased time or it gets resolved > >>> eventually? > >> > >> Yes and if you mean by repeating it’s not only affecting rsync but > >> you can see just now automount and NetworkManager get these page > >> allocation stalls and kswapd0 is getting heavy CPU load, are there any > >> other settings I can adjust? > > > > None that I am aware of. You might want to talk to FS guys, maybe they > > can figure out who is pinning file pages so that they cannot be > > reclaimed. They do not seem to be dirty or under writeback. It would be > > also interesting to see whether that is a regression. The warning is > > relatively new so you might have had this problem before just haven't > > noticed it. > > We have been getting out of memory errors for a while but those seem > to have gone away. this sounds suspicious. Are you really sure that this is a new problem? Btw. is there any reason to use 32b kernel at all? It will always suffer from a really small lowmem... > We did just replace the controller in the VessRAID > as there were some timeouts observed and multiple login/logout > attempts. > > By FS guys do you mean the linux-fsdevel or linux-fsf list? yeah linux-fsdevel. No idea what linux-fsf is. It would be great if you could collect some tracepoints before reporting the issue. At least those in events/vmscan/*. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS 2017-02-28 15:15 ` Michal Hocko @ 2017-02-28 16:19 ` Robert Kudyba 2017-02-28 16:56 ` Michal Hocko 0 siblings, 1 reply; 10+ messages in thread From: Robert Kudyba @ 2017-02-28 16:19 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-kernel > On Feb 28, 2017, at 10:15 AM, Michal Hocko <mhocko@kernel.org> wrote: > > On Tue 28-02-17 09:59:35, Robert Kudyba wrote: >> >>> On Feb 28, 2017, at 9:40 AM, Michal Hocko <mhocko@kernel.org> wrote: >>> >>> On Tue 28-02-17 09:33:49, Robert Kudyba wrote: >>>> >>>>> On Feb 28, 2017, at 9:15 AM, Michal Hocko <mhocko@kernel.org> wrote: >>>>> and this one is hitting the min watermark while there is not really >>>>> much to reclaim. Only the page cache which might be pinned and not >>>>> reclaimable from this context because this is GFP_NOFS request. It is >>>>> not all that surprising the reclaim context fights to get some memory. >>>>> There is a huge amount of the reclaimable slab which probably just makes >>>>> a slow progress. >>>>> >>>>> That is not something completely surprsing on 32b system I am afraid. >>>>> >>>>> Btw. is the stall repeating with the increased time or it gets resolved >>>>> eventually? >>>> >>>> Yes and if you mean by repeating it’s not only affecting rsync but >>>> you can see just now automount and NetworkManager get these page >>>> allocation stalls and kswapd0 is getting heavy CPU load, are there any >>>> other settings I can adjust? >>> >>> None that I am aware of. You might want to talk to FS guys, maybe they >>> can figure out who is pinning file pages so that they cannot be >>> reclaimed. They do not seem to be dirty or under writeback. It would be >>> also interesting to see whether that is a regression. The warning is >>> relatively new so you might have had this problem before just haven't >>> noticed it. >> >> We have been getting out of memory errors for a while but those seem >> to have gone away. > > this sounds suspicious. Are you really sure that this is a new problem? > Btw. is there any reason to use 32b kernel at all? It will always suffer > from a really small lowmem… No this has been a problem for a while. Not sure if this server can handle 64b it’s a bit old. > >> We did just replace the controller in the VessRAID >> as there were some timeouts observed and multiple login/logout >> attempts. >> >> By FS guys do you mean the linux-fsdevel or linux-fsf list? > > yeah linux-fsdevel. No idea what linux-fsf is. It would be great if you > could collect some tracepoints before reporting the issue. At least > those in events/vmscan/*. Will do here’s a perf report: Children Self Command Shared Object Symbol ◆ 63.54% 0.06% swapper [kernel.kallsyms] [k] cpu_startup_entry 62.07% 0.01% swapper [kernel.kallsyms] [k] default_idle_call 62.06% 0.01% swapper [kernel.kallsyms] [k] arch_cpu_idle 54.73% 54.73% swapper [kernel.kallsyms] [k] mwait_idle 45.22% 0.00% swapper [kernel.kallsyms] [k] start_secondary 18.35% 0.00% swapper [kernel.kallsyms] [k] rest_init 18.35% 0.00% swapper [kernel.kallsyms] [k] start_kernel 18.35% 0.00% swapper [kernel.kallsyms] [k] i386_start_kernel 13.97% 0.01% rsync [vdso] [.] __kernel_vsyscall 13.93% 0.01% rsync [kernel.kallsyms] [k] sysenter_past_esp 13.92% 0.02% rsync [kernel.kallsyms] [k] do_fast_syscall_32 8.58% 0.01% rsync [kernel.kallsyms] [k] sys_write 8.56% 0.01% rsync [kernel.kallsyms] [k] vfs_write 8.55% 0.01% rsync [kernel.kallsyms] [k] __vfs_write 8.55% 0.00% rsync [kernel.kallsyms] [k] ext4_file_write_iter 8.54% 0.00% rsync [kernel.kallsyms] [k] __generic_file_write_iter 8.36% 0.04% rsync [kernel.kallsyms] [k] generic_perform_write 6.96% 0.00% swapper [kernel.kallsyms] [k] irq_exit 6.81% 0.01% swapper [kernel.kallsyms] [k] do_softirq_own_stack 6.80% 0.01% swapper [kernel.kallsyms] [k] __do_softirq 6.30% 0.00% swapper [kernel.kallsyms] [k] do_IRQ 6.30% 0.00% swapper [kernel.kallsyms] [k] common_interrupt 5.76% 0.01% swapper [kernel.kallsyms] [k] net_rx_action 5.75% 0.10% swapper [kernel.kallsyms] [k] e1000_clean 5.33% 0.01% rsync [kernel.kallsyms] [k] sys_read 5.30% 0.00% rsync [kernel.kallsyms] [k] vfs_read 5.28% 0.02% rsync [kernel.kallsyms] [k] __vfs_read 5.23% 0.10% rsync [kernel.kallsyms] [k] generic_file_read_iter 5.22% 0.00% swapper [kernel.kallsyms] [k] netif_receive_skb_internal 5.21% 0.01% swapper [kernel.kallsyms] [k] __netif_receive_skb 5.20% 0.01% swapper [kernel.kallsyms] [k] __netif_receive_skb_core 5.19% 0.02% swapper [kernel.kallsyms] [k] ip_rcv 4.93% 0.01% swapper [kernel.kallsyms] [k] ip_rcv_finish 4.90% 0.01% swapper [kernel.kallsyms] [k] ip_local_deliver 4.86% 0.05% swapper [kernel.kallsyms] [k] e1000_clean_rx_irq 4.80% 0.00% swapper [kernel.kallsyms] [k] ip_local_deliver_finish 4.80% 0.01% swapper [kernel.kallsyms] [k] tcp_v4_rcv 4.73% 0.01% swapper [kernel.kallsyms] [k] tcp_v4_do_rcv ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS 2017-02-28 16:19 ` Robert Kudyba @ 2017-02-28 16:56 ` Michal Hocko 2017-02-28 19:32 ` Robert Kudyba 0 siblings, 1 reply; 10+ messages in thread From: Michal Hocko @ 2017-02-28 16:56 UTC (permalink / raw) To: Robert Kudyba; +Cc: linux-kernel On Tue 28-02-17 11:19:33, Robert Kudyba wrote: > > > On Feb 28, 2017, at 10:15 AM, Michal Hocko <mhocko@kernel.org> wrote: > > > > On Tue 28-02-17 09:59:35, Robert Kudyba wrote: > >> > >>> On Feb 28, 2017, at 9:40 AM, Michal Hocko <mhocko@kernel.org> wrote: > >>> > >>> On Tue 28-02-17 09:33:49, Robert Kudyba wrote: > >>>> > >>>>> On Feb 28, 2017, at 9:15 AM, Michal Hocko <mhocko@kernel.org> wrote: > >>>>> and this one is hitting the min watermark while there is not really > >>>>> much to reclaim. Only the page cache which might be pinned and not > >>>>> reclaimable from this context because this is GFP_NOFS request. It is > >>>>> not all that surprising the reclaim context fights to get some memory. > >>>>> There is a huge amount of the reclaimable slab which probably just makes > >>>>> a slow progress. > >>>>> > >>>>> That is not something completely surprsing on 32b system I am afraid. > >>>>> > >>>>> Btw. is the stall repeating with the increased time or it gets resolved > >>>>> eventually? > >>>> > >>>> Yes and if you mean by repeating it’s not only affecting rsync but > >>>> you can see just now automount and NetworkManager get these page > >>>> allocation stalls and kswapd0 is getting heavy CPU load, are there any > >>>> other settings I can adjust? > >>> > >>> None that I am aware of. You might want to talk to FS guys, maybe they > >>> can figure out who is pinning file pages so that they cannot be > >>> reclaimed. They do not seem to be dirty or under writeback. It would be > >>> also interesting to see whether that is a regression. The warning is > >>> relatively new so you might have had this problem before just haven't > >>> noticed it. > >> > >> We have been getting out of memory errors for a while but those seem > >> to have gone away. > > > > this sounds suspicious. Are you really sure that this is a new problem? > > Btw. is there any reason to use 32b kernel at all? It will always suffer > > from a really small lowmem… > > No this has been a problem for a while. Not sure if this server can > handle 64b it’s a bit old. Ok, this is unfortunate. There is usually not much interest to fixing 32b issues which are inherent to the used memory model and which are not regressions which would be fixable, I am afraid. > >> We did just replace the controller in the VessRAID > >> as there were some timeouts observed and multiple login/logout > >> attempts. > >> > >> By FS guys do you mean the linux-fsdevel or linux-fsf list? > > > > yeah linux-fsdevel. No idea what linux-fsf is. It would be great if you > > could collect some tracepoints before reporting the issue. At least > > those in events/vmscan/*. > > Will do here’s a perf report: this will not tell us much. Tracepoints have much better chance to tell us how reclaim is progressing. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS 2017-02-28 16:56 ` Michal Hocko @ 2017-02-28 19:32 ` Robert Kudyba 2017-03-01 8:06 ` Michal Hocko 0 siblings, 1 reply; 10+ messages in thread From: Robert Kudyba @ 2017-02-28 19:32 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-kernel > On Feb 28, 2017, at 11:56 AM, Michal Hocko <mhocko@kernel.org> wrote: > > On Tue 28-02-17 11:19:33, Robert Kudyba wrote: >> >>> On Feb 28, 2017, at 10:15 AM, Michal Hocko <mhocko@kernel.org> wrote: >>> >>> On Tue 28-02-17 09:59:35, Robert Kudyba wrote: >>>> >>>>> On Feb 28, 2017, at 9:40 AM, Michal Hocko <mhocko@kernel.org> wrote: >>>>> >>>>> On Tue 28-02-17 09:33:49, Robert Kudyba wrote: >>>>>> >>>>>>> On Feb 28, 2017, at 9:15 AM, Michal Hocko <mhocko@kernel.org> wrote: >>>>>>> and this one is hitting the min watermark while there is not really >>>>>>> much to reclaim. Only the page cache which might be pinned and not >>>>>>> reclaimable from this context because this is GFP_NOFS request. It is >>>>>>> not all that surprising the reclaim context fights to get some memory. >>>>>>> There is a huge amount of the reclaimable slab which probably just makes >>>>>>> a slow progress. >>>>>>> >>>>>>> That is not something completely surprsing on 32b system I am afraid. >>>>>>> >>>>>>> Btw. is the stall repeating with the increased time or it gets resolved >>>>>>> eventually? >>>>>> >>>>>> Yes and if you mean by repeating it’s not only affecting rsync but >>>>>> you can see just now automount and NetworkManager get these page >>>>>> allocation stalls and kswapd0 is getting heavy CPU load, are there any >>>>>> other settings I can adjust? >>>>> >>>>> None that I am aware of. You might want to talk to FS guys, maybe they >>>>> can figure out who is pinning file pages so that they cannot be >>>>> reclaimed. They do not seem to be dirty or under writeback. It would be >>>>> also interesting to see whether that is a regression. The warning is >>>>> relatively new so you might have had this problem before just haven't >>>>> noticed it. >>>> >>>> We have been getting out of memory errors for a while but those seem >>>> to have gone away. >>> >>> this sounds suspicious. Are you really sure that this is a new problem? >>> Btw. is there any reason to use 32b kernel at all? It will always suffer >>> from a really small lowmem… >> >> No this has been a problem for a while. Not sure if this server can >> handle 64b it’s a bit old. > > Ok, this is unfortunate. There is usually not much interest to fixing > 32b issues which are inherent to the used memory model and which are not > regressions which would be fixable, I am afraid. > >>>> We did just replace the controller in the VessRAID >>>> as there were some timeouts observed and multiple login/logout >>>> attempts. >>>> >>>> By FS guys do you mean the linux-fsdevel or linux-fsf list? >>> >>> yeah linux-fsdevel. No idea what linux-fsf is. It would be great if you >>> could collect some tracepoints before reporting the issue. At least >>> those in events/vmscan/*. >> >> Will do here’s a perf report: > > this will not tell us much. Tracepoints have much better chance to tell > us how reclaim is progressing. I have SystemTap configured are there any scripts in the SystemTap_Beginners_Guide.pdf that I can run to help? Sorry I’m brand new to tracepoints. I do see these “vmscan” from this command: stap -L 'kernel.trace("*")'|sort kernel.trace("vmscan:mm_shrink_slab_end") $shr:struct shrinker* $nid:int $shrinker_retval:int $unused_scan_cnt:long int $new_scan_cnt:long int $total_scan:long int kernel.trace("vmscan:mm_shrink_slab_start") $shr:struct shrinker* $sc:struct shrink_control* $nr_objects_to_shrink:long int $pgs_scanned:long unsigned int $lru_pgs:long unsigned int $cache_items:long unsigned int $delta:long long unsigned int $total_scan:long unsigned int kernel.trace("vmscan:mm_vmscan_direct_reclaim_begin") $order:int $may_writepage:int $gfp_flags:gfp_t $classzone_idx:int kernel.trace("vmscan:mm_vmscan_direct_reclaim_end") $nr_reclaimed:long unsigned int kernel.trace("vmscan:mm_vmscan_kswapd_sleep") $nid:int kernel.trace("vmscan:mm_vmscan_kswapd_wake") $nid:int $zid:int $order:int kernel.trace("vmscan:mm_vmscan_lru_isolate") $classzone_idx:int $order:int $nr_requested:long unsigned int $nr_scanned:long unsigned int $nr_taken:long unsigned int $isolate_mode:isolate_mode_t $file:int kernel.trace("vmscan:mm_vmscan_lru_shrink_inactive") $nid:int $nr_scanned:long unsigned int $nr_reclaimed:long unsigned int $priority:int $file:int kernel.trace("vmscan:mm_vmscan_memcg_isolate") $classzone_idx:int $order:int $nr_requested:long unsigned int $nr_scanned:long unsigned int $nr_taken:long unsigned int $isolate_mode:isolate_mode_t $file:int kernel.trace("vmscan:mm_vmscan_memcg_reclaim_begin") $order:int $may_writepage:int $gfp_flags:gfp_t $classzone_idx:int kernel.trace("vmscan:mm_vmscan_memcg_reclaim_end") $nr_reclaimed:long unsigned int kernel.trace("vmscan:mm_vmscan_memcg_softlimit_reclaim_begin") $order:int $may_writepage:int $gfp_flags:gfp_t $classzone_idx:int kernel.trace("vmscan:mm_vmscan_memcg_softlimit_reclaim_end") $nr_reclaimed:long unsigned int kernel.trace("vmscan:mm_vmscan_wakeup_kswapd") $nid:int $zid:int $order:int kernel.trace("vmscan:mm_vmscan_writepage") $page:struct page* ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS 2017-02-28 19:32 ` Robert Kudyba @ 2017-03-01 8:06 ` Michal Hocko [not found] ` <F77DA4E6-EF9B-427D-8FE9-9FB940A9B009@fordham.edu> 0 siblings, 1 reply; 10+ messages in thread From: Michal Hocko @ 2017-03-01 8:06 UTC (permalink / raw) To: Robert Kudyba; +Cc: linux-kernel On Tue 28-02-17 14:32:18, Robert Kudyba wrote: > > > On Feb 28, 2017, at 11:56 AM, Michal Hocko <mhocko@kernel.org> wrote: [...] > >> Will do here’s a perf report: > > > > this will not tell us much. Tracepoints have much better chance to tell > > us how reclaim is progressing. > > I have SystemTap configured are there any scripts in the > SystemTap_Beginners_Guide.pdf that I can run to help? Sorry I’m > brand new to tracepoints. > I am not familiar with systemtap much. What I meant was to mount -t tracefs none /trace echo 1 > /trace/events/vmscan/enable but > I do see these “vmscan” from this command: > stap -L 'kernel.trace("*")'|sort > > kernel.trace("vmscan:mm_shrink_slab_end") $shr:struct shrinker* $nid:int $shrinker_retval:int $unused_scan_cnt:long int $new_scan_cnt:long int $total_scan:long int > kernel.trace("vmscan:mm_shrink_slab_start") $shr:struct shrinker* $sc:struct shrink_control* $nr_objects_to_shrink:long int $pgs_scanned:long unsigned int $lru_pgs:long unsigned int $cache_items:long unsigned int $delta:long long unsigned int $total_scan:long unsigned int > kernel.trace("vmscan:mm_vmscan_direct_reclaim_begin") $order:int $may_writepage:int $gfp_flags:gfp_t $classzone_idx:int > kernel.trace("vmscan:mm_vmscan_direct_reclaim_end") $nr_reclaimed:long unsigned int > kernel.trace("vmscan:mm_vmscan_kswapd_sleep") $nid:int > kernel.trace("vmscan:mm_vmscan_kswapd_wake") $nid:int $zid:int $order:int > kernel.trace("vmscan:mm_vmscan_lru_isolate") $classzone_idx:int $order:int $nr_requested:long unsigned int $nr_scanned:long unsigned int $nr_taken:long unsigned int $isolate_mode:isolate_mode_t $file:int > kernel.trace("vmscan:mm_vmscan_lru_shrink_inactive") $nid:int $nr_scanned:long unsigned int $nr_reclaimed:long unsigned int $priority:int $file:int > kernel.trace("vmscan:mm_vmscan_memcg_isolate") $classzone_idx:int $order:int $nr_requested:long unsigned int $nr_scanned:long unsigned int $nr_taken:long unsigned int $isolate_mode:isolate_mode_t $file:int > kernel.trace("vmscan:mm_vmscan_memcg_reclaim_begin") $order:int $may_writepage:int $gfp_flags:gfp_t $classzone_idx:int > kernel.trace("vmscan:mm_vmscan_memcg_reclaim_end") $nr_reclaimed:long unsigned int > kernel.trace("vmscan:mm_vmscan_memcg_softlimit_reclaim_begin") $order:int $may_writepage:int $gfp_flags:gfp_t $classzone_idx:int > kernel.trace("vmscan:mm_vmscan_memcg_softlimit_reclaim_end") $nr_reclaimed:long unsigned int > kernel.trace("vmscan:mm_vmscan_wakeup_kswapd") $nid:int $zid:int $order:int > kernel.trace("vmscan:mm_vmscan_writepage") $page:struct page* this looks like it would achieve the same -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <F77DA4E6-EF9B-427D-8FE9-9FB940A9B009@fordham.edu>]
* Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS [not found] ` <F77DA4E6-EF9B-427D-8FE9-9FB940A9B009@fordham.edu> @ 2017-03-01 17:36 ` Michal Hocko [not found] ` <D2C3C9EB-7E99-4420-887A-13526002E267@fordham.edu> 0 siblings, 1 reply; 10+ messages in thread From: Michal Hocko @ 2017-03-01 17:36 UTC (permalink / raw) To: Robert Kudyba; +Cc: linux-kernel On Wed 01-03-17 10:55:33, Robert Kudyba wrote: > > > On Mar 1, 2017, at 3:06 AM, Michal Hocko <mhocko@kernel.org> wrote: > > > > On Tue 28-02-17 14:32:18, Robert Kudyba wrote: > >> > >>> On Feb 28, 2017, at 11:56 AM, Michal Hocko <mhocko@kernel.org <mailto:mhocko@kernel.org>> wrote: > > [...] > >>>> Will do here’s a perf report: > >>> > >>> this will not tell us much. Tracepoints have much better chance to tell > >>> us how reclaim is progressing. > >> > >> I have SystemTap configured are there any scripts in the > >> SystemTap_Beginners_Guide.pdf that I can run to help? Sorry I’m > >> brand new to tracepoints. > >> > > > > I am not familiar with systemtap much. What I meant was to > > mount -t tracefs none /trace > > echo 1 > /trace/events/vmscan/enable > > OK I did this is there another step? Yeah, you have to read the actual tracing data. Sorry for not beaing clear enough cat /trace/trace_pipe > output > >> I do see these “vmscan” from this command: > >> stap -L 'kernel.trace("*")'|sort > >> > >> kernel.trace("vmscan:mm_shrink_slab_end") $shr:struct shrinker* $nid:int $shrinker_retval:int $unused_scan_cnt:long int $new_scan_cnt:long int $total_scan:long int > >> kernel.trace("vmscan:mm_shrink_slab_start") $shr:struct shrinker* $sc:struct shrink_control* $nr_objects_to_shrink:long int $pgs_scanned:long unsigned int $lru_pgs:long unsigned int $cache_items:long unsigned int $delta:long long unsigned int $total_scan:long unsigned int > >> kernel.trace("vmscan:mm_vmscan_direct_reclaim_begin") $order:int $may_writepage:int $gfp_flags:gfp_t $classzone_idx:int > >> kernel.trace("vmscan:mm_vmscan_direct_reclaim_end") $nr_reclaimed:long unsigned int > >> kernel.trace("vmscan:mm_vmscan_kswapd_sleep") $nid:int > >> kernel.trace("vmscan:mm_vmscan_kswapd_wake") $nid:int $zid:int $order:int > >> kernel.trace("vmscan:mm_vmscan_lru_isolate") $classzone_idx:int $order:int $nr_requested:long unsigned int $nr_scanned:long unsigned int $nr_taken:long unsigned int $isolate_mode:isolate_mode_t $file:int > >> kernel.trace("vmscan:mm_vmscan_lru_shrink_inactive") $nid:int $nr_scanned:long unsigned int $nr_reclaimed:long unsigned int $priority:int $file:int > >> kernel.trace("vmscan:mm_vmscan_memcg_isolate") $classzone_idx:int $order:int $nr_requested:long unsigned int $nr_scanned:long unsigned int $nr_taken:long unsigned int $isolate_mode:isolate_mode_t $file:int > >> kernel.trace("vmscan:mm_vmscan_memcg_reclaim_begin") $order:int $may_writepage:int $gfp_flags:gfp_t $classzone_idx:int > >> kernel.trace("vmscan:mm_vmscan_memcg_reclaim_end") $nr_reclaimed:long unsigned int > >> kernel.trace("vmscan:mm_vmscan_memcg_softlimit_reclaim_begin") $order:int $may_writepage:int $gfp_flags:gfp_t $classzone_idx:int > >> kernel.trace("vmscan:mm_vmscan_memcg_softlimit_reclaim_end") $nr_reclaimed:long unsigned int > >> kernel.trace("vmscan:mm_vmscan_wakeup_kswapd") $nid:int $zid:int $order:int > >> kernel.trace("vmscan:mm_vmscan_writepage") $page:struct page* > > > > this looks like it would achieve the same > > Is there anything else I can provide? I am not familiar with filesystems and their tracepoints which might tell us more [...] > Mar 1 06:30:59 curie kernel: kworker/u16:1: > Mar 1 06:30:59 curie kernel: kthreadd: page allocation stalls for 10197ms, order:1 > Mar 1 06:30:59 curie kernel: page allocation stalls for 11274ms, order:1 OK, so unlike in the previous situation, this is higher order request (aka 2 physically contiguous pages). > Mar 1 06:31:02 curie kernel: Normal free:130224kB min:128484kB low:160604kB high:192724kB active_anon:0kB inactive_anon:20kB active_file:308296kB inactive_file:3864kB unevictable:0kB writepending:0kB present:892920kB managed:791152kB mlocked:0kB slab_reclaimable:271984kB slab_unreclaimable:35880kB kernel_stack:1808kB pagetables:0kB bounce:0kB free_pcp:248kB local_pcp:0kB free_cma:0kB In this case there is a lot of page cache to be reaclaimed. Most of it on the active LRU list. There was a bug which could result in this fixed by b4536f0c829c ("mm, memcg: fix the active list aging for lowmem requests when memcg is enabled") which made it into the stable tree 4.9.5 but you have said that you are using 4.9.12 so you should already have it. So it seems that this pagecache is indeed activate based on the usage pattern. The rest of the report is rather messed up but I assume that you simply do not have contiguous memory in the lowmem. This surely sounds like a 32b specific problem which is not reasonably fixable. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <D2C3C9EB-7E99-4420-887A-13526002E267@fordham.edu>]
* Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS [not found] ` <D2C3C9EB-7E99-4420-887A-13526002E267@fordham.edu> @ 2017-03-01 19:19 ` Michal Hocko 0 siblings, 0 replies; 10+ messages in thread From: Michal Hocko @ 2017-03-01 19:19 UTC (permalink / raw) To: Robert Kudyba; +Cc: linux-kernel On Wed 01-03-17 13:06:57, Robert Kudyba wrote: [...] > Does this ever finish? Once there is no memory pressure > The output file seems to grow really quickly. Here are a few lines and > I can upload the file somewhere if you think it’d be helpful: [...] > kswapd0-52 [001] .... 90900.812734: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=32 nr_reclaimed=32 priority=12 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC > kswapd0-52 [001] .... 90900.813762: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=1 nr_reclaimed=0 priority=11 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC > kswapd0-52 [001] .... 90900.813928: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=32 nr_reclaimed=32 priority=11 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC > kswapd0-52 [001] .... 90900.814823: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=22 nr_reclaimed=22 priority=11 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC > kswapd0-52 [001] .... 90900.815514: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=3 nr_reclaimed=0 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC > kswapd0-52 [001] .... 90900.815643: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=32 nr_reclaimed=32 priority=10 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC > kswapd0-52 [002] .... 91073.571793: mm_vmscan_lru_shrink_inactive: nid=0 nr_scanned=27 nr_reclaimed=27 priority=5 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC all those are from the kswapd (background memory reclaim). Which means that it doesn't catch any allocation which can stall for too long. Anyway the above tracepoint show that we are able to make some progress during the reclaim (nr_reclaimed > 0). So I suspect that this is indeed a large lowmem pressure and I do not see what we can do about that... > > The rest of the report is rather messed up but I assume that you simply > > do not have contiguous memory in the lowmem. This surely sounds like a > > 32b specific problem which is not reasonably fixable. > > Messed up as in unreadable? yeah 2 reports interleaved. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-03-01 19:21 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-02-27 21:36 rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS Robert Kudyba 2017-02-28 14:15 ` Michal Hocko [not found] ` <40F07E96-7468-4355-B8EA-4B42F575ACAB@fordham.edu> 2017-02-28 14:40 ` Michal Hocko [not found] ` <3E4C7821-A93D-4956-A0E0-730BEC67C9F0@fordham.edu> 2017-02-28 15:15 ` Michal Hocko 2017-02-28 16:19 ` Robert Kudyba 2017-02-28 16:56 ` Michal Hocko 2017-02-28 19:32 ` Robert Kudyba 2017-03-01 8:06 ` Michal Hocko [not found] ` <F77DA4E6-EF9B-427D-8FE9-9FB940A9B009@fordham.edu> 2017-03-01 17:36 ` Michal Hocko [not found] ` <D2C3C9EB-7E99-4420-887A-13526002E267@fordham.edu> 2017-03-01 19:19 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).