From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752570AbdCARht (ORCPT ); Wed, 1 Mar 2017 12:37:49 -0500 Received: from mx2.suse.de ([195.135.220.15]:34799 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751073AbdCARhq (ORCPT ); Wed, 1 Mar 2017 12:37:46 -0500 Date: Wed, 1 Mar 2017 18:36:26 +0100 From: Michal Hocko To: Robert Kudyba Cc: linux-kernel@vger.kernel.org Subject: Re: rsync: page allocation stalls in kernel 4.9.10 to a VessRAID NAS Message-ID: <20170301173625.GA20360@dhcp22.suse.cz> References: <20170228141520.GA28139@dhcp22.suse.cz> <40F07E96-7468-4355-B8EA-4B42F575ACAB@fordham.edu> <20170228144045.GD26792@dhcp22.suse.cz> <3E4C7821-A93D-4956-A0E0-730BEC67C9F0@fordham.edu> <20170228151535.GE26792@dhcp22.suse.cz> <63A3D887-EEDA-46D2-AB59-D5955FC3D23D@fordham.edu> <20170228165638.GA27726@dhcp22.suse.cz> <20170301080604.GB1124@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 01-03-17 10:55:33, Robert Kudyba wrote: > > > On Mar 1, 2017, at 3:06 AM, Michal Hocko wrote: > > > > On Tue 28-02-17 14:32:18, Robert Kudyba wrote: > >> > >>> On Feb 28, 2017, at 11:56 AM, Michal Hocko > wrote: > > [...] > >>>> Will do here’s a perf report: > >>> > >>> this will not tell us much. Tracepoints have much better chance to tell > >>> us how reclaim is progressing. > >> > >> I have SystemTap configured are there any scripts in the > >> SystemTap_Beginners_Guide.pdf that I can run to help? Sorry I’m > >> brand new to tracepoints. > >> > > > > I am not familiar with systemtap much. What I meant was to > > mount -t tracefs none /trace > > echo 1 > /trace/events/vmscan/enable > > OK I did this is there another step? Yeah, you have to read the actual tracing data. Sorry for not beaing clear enough cat /trace/trace_pipe > output > >> I do see these “vmscan” from this command: > >> stap -L 'kernel.trace("*")'|sort > >> > >> kernel.trace("vmscan:mm_shrink_slab_end") $shr:struct shrinker* $nid:int $shrinker_retval:int $unused_scan_cnt:long int $new_scan_cnt:long int $total_scan:long int > >> kernel.trace("vmscan:mm_shrink_slab_start") $shr:struct shrinker* $sc:struct shrink_control* $nr_objects_to_shrink:long int $pgs_scanned:long unsigned int $lru_pgs:long unsigned int $cache_items:long unsigned int $delta:long long unsigned int $total_scan:long unsigned int > >> kernel.trace("vmscan:mm_vmscan_direct_reclaim_begin") $order:int $may_writepage:int $gfp_flags:gfp_t $classzone_idx:int > >> kernel.trace("vmscan:mm_vmscan_direct_reclaim_end") $nr_reclaimed:long unsigned int > >> kernel.trace("vmscan:mm_vmscan_kswapd_sleep") $nid:int > >> kernel.trace("vmscan:mm_vmscan_kswapd_wake") $nid:int $zid:int $order:int > >> kernel.trace("vmscan:mm_vmscan_lru_isolate") $classzone_idx:int $order:int $nr_requested:long unsigned int $nr_scanned:long unsigned int $nr_taken:long unsigned int $isolate_mode:isolate_mode_t $file:int > >> kernel.trace("vmscan:mm_vmscan_lru_shrink_inactive") $nid:int $nr_scanned:long unsigned int $nr_reclaimed:long unsigned int $priority:int $file:int > >> kernel.trace("vmscan:mm_vmscan_memcg_isolate") $classzone_idx:int $order:int $nr_requested:long unsigned int $nr_scanned:long unsigned int $nr_taken:long unsigned int $isolate_mode:isolate_mode_t $file:int > >> kernel.trace("vmscan:mm_vmscan_memcg_reclaim_begin") $order:int $may_writepage:int $gfp_flags:gfp_t $classzone_idx:int > >> kernel.trace("vmscan:mm_vmscan_memcg_reclaim_end") $nr_reclaimed:long unsigned int > >> kernel.trace("vmscan:mm_vmscan_memcg_softlimit_reclaim_begin") $order:int $may_writepage:int $gfp_flags:gfp_t $classzone_idx:int > >> kernel.trace("vmscan:mm_vmscan_memcg_softlimit_reclaim_end") $nr_reclaimed:long unsigned int > >> kernel.trace("vmscan:mm_vmscan_wakeup_kswapd") $nid:int $zid:int $order:int > >> kernel.trace("vmscan:mm_vmscan_writepage") $page:struct page* > > > > this looks like it would achieve the same > > Is there anything else I can provide? I am not familiar with filesystems and their tracepoints which might tell us more [...] > Mar 1 06:30:59 curie kernel: kworker/u16:1: > Mar 1 06:30:59 curie kernel: kthreadd: page allocation stalls for 10197ms, order:1 > Mar 1 06:30:59 curie kernel: page allocation stalls for 11274ms, order:1 OK, so unlike in the previous situation, this is higher order request (aka 2 physically contiguous pages). > Mar 1 06:31:02 curie kernel: Normal free:130224kB min:128484kB low:160604kB high:192724kB active_anon:0kB inactive_anon:20kB active_file:308296kB inactive_file:3864kB unevictable:0kB writepending:0kB present:892920kB managed:791152kB mlocked:0kB slab_reclaimable:271984kB slab_unreclaimable:35880kB kernel_stack:1808kB pagetables:0kB bounce:0kB free_pcp:248kB local_pcp:0kB free_cma:0kB In this case there is a lot of page cache to be reaclaimed. Most of it on the active LRU list. There was a bug which could result in this fixed by b4536f0c829c ("mm, memcg: fix the active list aging for lowmem requests when memcg is enabled") which made it into the stable tree 4.9.5 but you have said that you are using 4.9.12 so you should already have it. So it seems that this pagecache is indeed activate based on the usage pattern. The rest of the report is rather messed up but I assume that you simply do not have contiguous memory in the lowmem. This surely sounds like a 32b specific problem which is not reasonably fixable. -- Michal Hocko SUSE Labs