From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751513AbdB1Quw (ORCPT ); Tue, 28 Feb 2017 11:50:52 -0500 Received: from mail-qk0-f179.google.com ([209.85.220.179]:33507 "EHLO mail-qk0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751037AbdB1Quq (ORCPT ); Tue, 28 Feb 2017 11:50:46 -0500 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS From: Robert Kudyba In-Reply-To: <20170228151535.GE26792@dhcp22.suse.cz> Date: Tue, 28 Feb 2017 11:19:33 -0500 Cc: linux-kernel@vger.kernel.org Message-Id: <63A3D887-EEDA-46D2-AB59-D5955FC3D23D@fordham.edu> References: <20170228141520.GA28139@dhcp22.suse.cz> <40F07E96-7468-4355-B8EA-4B42F575ACAB@fordham.edu> <20170228144045.GD26792@dhcp22.suse.cz> <3E4C7821-A93D-4956-A0E0-730BEC67C9F0@fordham.edu> <20170228151535.GE26792@dhcp22.suse.cz> To: Michal Hocko X-Mailer: Apple Mail (2.3124) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id v1SGou0A014470 > On Feb 28, 2017, at 10:15 AM, Michal Hocko wrote: > > On Tue 28-02-17 09:59:35, Robert Kudyba wrote: >> >>> On Feb 28, 2017, at 9:40 AM, Michal Hocko wrote: >>> >>> On Tue 28-02-17 09:33:49, Robert Kudyba wrote: >>>> >>>>> On Feb 28, 2017, at 9:15 AM, Michal Hocko wrote: >>>>> and this one is hitting the min watermark while there is not really >>>>> much to reclaim. Only the page cache which might be pinned and not >>>>> reclaimable from this context because this is GFP_NOFS request. It is >>>>> not all that surprising the reclaim context fights to get some memory. >>>>> There is a huge amount of the reclaimable slab which probably just makes >>>>> a slow progress. >>>>> >>>>> That is not something completely surprsing on 32b system I am afraid. >>>>> >>>>> Btw. is the stall repeating with the increased time or it gets resolved >>>>> eventually? >>>> >>>> Yes and if you mean by repeating it’s not only affecting rsync but >>>> you can see just now automount and NetworkManager get these page >>>> allocation stalls and kswapd0 is getting heavy CPU load, are there any >>>> other settings I can adjust? >>> >>> None that I am aware of. You might want to talk to FS guys, maybe they >>> can figure out who is pinning file pages so that they cannot be >>> reclaimed. They do not seem to be dirty or under writeback. It would be >>> also interesting to see whether that is a regression. The warning is >>> relatively new so you might have had this problem before just haven't >>> noticed it. >> >> We have been getting out of memory errors for a while but those seem >> to have gone away. > > this sounds suspicious. Are you really sure that this is a new problem? > Btw. is there any reason to use 32b kernel at all? It will always suffer > from a really small lowmem… No this has been a problem for a while. Not sure if this server can handle 64b it’s a bit old. > >> We did just replace the controller in the VessRAID >> as there were some timeouts observed and multiple login/logout >> attempts. >> >> By FS guys do you mean the linux-fsdevel or linux-fsf list? > > yeah linux-fsdevel. No idea what linux-fsf is. It would be great if you > could collect some tracepoints before reporting the issue. At least > those in events/vmscan/*. Will do here’s a perf report: Children Self Command Shared Object Symbol ◆ 63.54% 0.06% swapper [kernel.kallsyms] [k] cpu_startup_entry 62.07% 0.01% swapper [kernel.kallsyms] [k] default_idle_call 62.06% 0.01% swapper [kernel.kallsyms] [k] arch_cpu_idle 54.73% 54.73% swapper [kernel.kallsyms] [k] mwait_idle 45.22% 0.00% swapper [kernel.kallsyms] [k] start_secondary 18.35% 0.00% swapper [kernel.kallsyms] [k] rest_init 18.35% 0.00% swapper [kernel.kallsyms] [k] start_kernel 18.35% 0.00% swapper [kernel.kallsyms] [k] i386_start_kernel 13.97% 0.01% rsync [vdso] [.] __kernel_vsyscall 13.93% 0.01% rsync [kernel.kallsyms] [k] sysenter_past_esp 13.92% 0.02% rsync [kernel.kallsyms] [k] do_fast_syscall_32 8.58% 0.01% rsync [kernel.kallsyms] [k] sys_write 8.56% 0.01% rsync [kernel.kallsyms] [k] vfs_write 8.55% 0.01% rsync [kernel.kallsyms] [k] __vfs_write 8.55% 0.00% rsync [kernel.kallsyms] [k] ext4_file_write_iter 8.54% 0.00% rsync [kernel.kallsyms] [k] __generic_file_write_iter 8.36% 0.04% rsync [kernel.kallsyms] [k] generic_perform_write 6.96% 0.00% swapper [kernel.kallsyms] [k] irq_exit 6.81% 0.01% swapper [kernel.kallsyms] [k] do_softirq_own_stack 6.80% 0.01% swapper [kernel.kallsyms] [k] __do_softirq 6.30% 0.00% swapper [kernel.kallsyms] [k] do_IRQ 6.30% 0.00% swapper [kernel.kallsyms] [k] common_interrupt 5.76% 0.01% swapper [kernel.kallsyms] [k] net_rx_action 5.75% 0.10% swapper [kernel.kallsyms] [k] e1000_clean 5.33% 0.01% rsync [kernel.kallsyms] [k] sys_read 5.30% 0.00% rsync [kernel.kallsyms] [k] vfs_read 5.28% 0.02% rsync [kernel.kallsyms] [k] __vfs_read 5.23% 0.10% rsync [kernel.kallsyms] [k] generic_file_read_iter 5.22% 0.00% swapper [kernel.kallsyms] [k] netif_receive_skb_internal 5.21% 0.01% swapper [kernel.kallsyms] [k] __netif_receive_skb 5.20% 0.01% swapper [kernel.kallsyms] [k] __netif_receive_skb_core 5.19% 0.02% swapper [kernel.kallsyms] [k] ip_rcv 4.93% 0.01% swapper [kernel.kallsyms] [k] ip_rcv_finish 4.90% 0.01% swapper [kernel.kallsyms] [k] ip_local_deliver 4.86% 0.05% swapper [kernel.kallsyms] [k] e1000_clean_rx_irq 4.80% 0.00% swapper [kernel.kallsyms] [k] ip_local_deliver_finish 4.80% 0.01% swapper [kernel.kallsyms] [k] tcp_v4_rcv 4.73% 0.01% swapper [kernel.kallsyms] [k] tcp_v4_do_rcv