From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752364AbdB1PPl (ORCPT <rfc822;w@1wt.eu>);
        Tue, 28 Feb 2017 10:15:41 -0500
Received: from mx2.suse.de ([195.135.220.15]:40241 "EHLO mx2.suse.de"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751670AbdB1PPi (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 28 Feb 2017 10:15:38 -0500
Date: Tue, 28 Feb 2017 16:15:35 +0100
From: Michal Hocko <mhocko@kernel.org>
To: Robert Kudyba <rkudyba@fordham.edu>
Cc: linux-kernel@vger.kernel.org
Subject: Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS
Message-ID: <20170228151535.GE26792@dhcp22.suse.cz>
References: <C16ACE34-A2F0-4A9F-BFBD-E369733A214F@fordham.edu>
 <20170228141520.GA28139@dhcp22.suse.cz>
 <40F07E96-7468-4355-B8EA-4B42F575ACAB@fordham.edu>
 <20170228144045.GD26792@dhcp22.suse.cz>
 <3E4C7821-A93D-4956-A0E0-730BEC67C9F0@fordham.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <3E4C7821-A93D-4956-A0E0-730BEC67C9F0@fordham.edu>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue 28-02-17 09:59:35, Robert Kudyba wrote:
> 
> > On Feb 28, 2017, at 9:40 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > On Tue 28-02-17 09:33:49, Robert Kudyba wrote:
> >> 
> >>> On Feb 28, 2017, at 9:15 AM, Michal Hocko <mhocko@kernel.org> wrote:
> >>> and this one is hitting the min watermark while there is not really
> >>> much to reclaim. Only the page cache which might be pinned and not
> >>> reclaimable from this context because this is GFP_NOFS request. It is
> >>> not all that surprising the reclaim context fights to get some memory.
> >>> There is a huge amount of the reclaimable slab which probably just makes
> >>> a slow progress.
> >>> 
> >>> That is not something completely surprsing on 32b system I am afraid.
> >>> 
> >>> Btw. is the stall repeating with the increased time or it gets resolved
> >>> eventually?
> >> 
> >> Yes and if you mean by repeating it’s not only affecting rsync but
> >> you can see just now automount and NetworkManager get these page
> >> allocation stalls and kswapd0 is getting heavy CPU load, are there any
> >> other settings I can adjust?
> > 
> > None that I am aware of. You might want to talk to FS guys, maybe they
> > can figure out who is pinning file pages so that they cannot be
> > reclaimed. They do not seem to be dirty or under writeback. It would be
> > also interesting to see whether that is a regression. The warning is
> > relatively new so you might have had this problem before just haven't
> > noticed it.
> 
> We have been getting out of memory errors for a while but those seem
> to have gone away.

this sounds suspicious. Are you really sure that this is a new problem?
Btw. is there any reason to use 32b kernel at all? It will always suffer
from a really small lowmem...

> We did just replace the controller in the VessRAID
> as there were some timeouts observed and multiple login/logout
> attempts.
> 
> By FS guys do you mean the linux-fsdevel or linux-fsf list?

yeah linux-fsdevel. No idea what linux-fsf is. It would be great if you
could collect some tracepoints before reporting the issue. At least
those in events/vmscan/*.

-- 
Michal Hocko
SUSE Labs