From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752364AbdB1PPl (ORCPT ); Tue, 28 Feb 2017 10:15:41 -0500 Received: from mx2.suse.de ([195.135.220.15]:40241 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751670AbdB1PPi (ORCPT ); Tue, 28 Feb 2017 10:15:38 -0500 Date: Tue, 28 Feb 2017 16:15:35 +0100 From: Michal Hocko To: Robert Kudyba Cc: linux-kernel@vger.kernel.org Subject: Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS Message-ID: <20170228151535.GE26792@dhcp22.suse.cz> References: <20170228141520.GA28139@dhcp22.suse.cz> <40F07E96-7468-4355-B8EA-4B42F575ACAB@fordham.edu> <20170228144045.GD26792@dhcp22.suse.cz> <3E4C7821-A93D-4956-A0E0-730BEC67C9F0@fordham.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <3E4C7821-A93D-4956-A0E0-730BEC67C9F0@fordham.edu> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 28-02-17 09:59:35, Robert Kudyba wrote: > > > On Feb 28, 2017, at 9:40 AM, Michal Hocko wrote: > > > > On Tue 28-02-17 09:33:49, Robert Kudyba wrote: > >> > >>> On Feb 28, 2017, at 9:15 AM, Michal Hocko wrote: > >>> and this one is hitting the min watermark while there is not really > >>> much to reclaim. Only the page cache which might be pinned and not > >>> reclaimable from this context because this is GFP_NOFS request. It is > >>> not all that surprising the reclaim context fights to get some memory. > >>> There is a huge amount of the reclaimable slab which probably just makes > >>> a slow progress. > >>> > >>> That is not something completely surprsing on 32b system I am afraid. > >>> > >>> Btw. is the stall repeating with the increased time or it gets resolved > >>> eventually? > >> > >> Yes and if you mean by repeating it’s not only affecting rsync but > >> you can see just now automount and NetworkManager get these page > >> allocation stalls and kswapd0 is getting heavy CPU load, are there any > >> other settings I can adjust? > > > > None that I am aware of. You might want to talk to FS guys, maybe they > > can figure out who is pinning file pages so that they cannot be > > reclaimed. They do not seem to be dirty or under writeback. It would be > > also interesting to see whether that is a regression. The warning is > > relatively new so you might have had this problem before just haven't > > noticed it. > > We have been getting out of memory errors for a while but those seem > to have gone away. this sounds suspicious. Are you really sure that this is a new problem? Btw. is there any reason to use 32b kernel at all? It will always suffer from a really small lowmem... > We did just replace the controller in the VessRAID > as there were some timeouts observed and multiple login/logout > attempts. > > By FS guys do you mean the linux-fsdevel or linux-fsf list? yeah linux-fsdevel. No idea what linux-fsf is. It would be great if you could collect some tracepoints before reporting the issue. At least those in events/vmscan/*. -- Michal Hocko SUSE Labs