From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751485AbdB1Q5C (ORCPT ); Tue, 28 Feb 2017 11:57:02 -0500 Received: from mx2.suse.de ([195.135.220.15]:50064 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751356AbdB1Q5A (ORCPT ); Tue, 28 Feb 2017 11:57:00 -0500 Date: Tue, 28 Feb 2017 17:56:39 +0100 From: Michal Hocko To: Robert Kudyba Cc: linux-kernel@vger.kernel.org Subject: Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS Message-ID: <20170228165638.GA27726@dhcp22.suse.cz> References: <20170228141520.GA28139@dhcp22.suse.cz> <40F07E96-7468-4355-B8EA-4B42F575ACAB@fordham.edu> <20170228144045.GD26792@dhcp22.suse.cz> <3E4C7821-A93D-4956-A0E0-730BEC67C9F0@fordham.edu> <20170228151535.GE26792@dhcp22.suse.cz> <63A3D887-EEDA-46D2-AB59-D5955FC3D23D@fordham.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <63A3D887-EEDA-46D2-AB59-D5955FC3D23D@fordham.edu> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 28-02-17 11:19:33, Robert Kudyba wrote: > > > On Feb 28, 2017, at 10:15 AM, Michal Hocko wrote: > > > > On Tue 28-02-17 09:59:35, Robert Kudyba wrote: > >> > >>> On Feb 28, 2017, at 9:40 AM, Michal Hocko wrote: > >>> > >>> On Tue 28-02-17 09:33:49, Robert Kudyba wrote: > >>>> > >>>>> On Feb 28, 2017, at 9:15 AM, Michal Hocko wrote: > >>>>> and this one is hitting the min watermark while there is not really > >>>>> much to reclaim. Only the page cache which might be pinned and not > >>>>> reclaimable from this context because this is GFP_NOFS request. It is > >>>>> not all that surprising the reclaim context fights to get some memory. > >>>>> There is a huge amount of the reclaimable slab which probably just makes > >>>>> a slow progress. > >>>>> > >>>>> That is not something completely surprsing on 32b system I am afraid. > >>>>> > >>>>> Btw. is the stall repeating with the increased time or it gets resolved > >>>>> eventually? > >>>> > >>>> Yes and if you mean by repeating it’s not only affecting rsync but > >>>> you can see just now automount and NetworkManager get these page > >>>> allocation stalls and kswapd0 is getting heavy CPU load, are there any > >>>> other settings I can adjust? > >>> > >>> None that I am aware of. You might want to talk to FS guys, maybe they > >>> can figure out who is pinning file pages so that they cannot be > >>> reclaimed. They do not seem to be dirty or under writeback. It would be > >>> also interesting to see whether that is a regression. The warning is > >>> relatively new so you might have had this problem before just haven't > >>> noticed it. > >> > >> We have been getting out of memory errors for a while but those seem > >> to have gone away. > > > > this sounds suspicious. Are you really sure that this is a new problem? > > Btw. is there any reason to use 32b kernel at all? It will always suffer > > from a really small lowmem… > > No this has been a problem for a while. Not sure if this server can > handle 64b it’s a bit old. Ok, this is unfortunate. There is usually not much interest to fixing 32b issues which are inherent to the used memory model and which are not regressions which would be fixable, I am afraid. > >> We did just replace the controller in the VessRAID > >> as there were some timeouts observed and multiple login/logout > >> attempts. > >> > >> By FS guys do you mean the linux-fsdevel or linux-fsf list? > > > > yeah linux-fsdevel. No idea what linux-fsf is. It would be great if you > > could collect some tracepoints before reporting the issue. At least > > those in events/vmscan/*. > > Will do here’s a perf report: this will not tell us much. Tracepoints have much better chance to tell us how reclaim is progressing. -- Michal Hocko SUSE Labs