From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id 967BE6B0003 for ; Fri, 27 Jul 2018 07:15:36 -0400 (EDT) Received: by mail-ed1-f70.google.com with SMTP id c2-v6so2063313edi.20 for ; Fri, 27 Jul 2018 04:15:36 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id b13-v6si1482035edk.422.2018.07.27.04.15.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 27 Jul 2018 04:15:34 -0700 (PDT) Subject: Re: Caching/buffers become useless after some time References: <20180712113411.GB328@dhcp22.suse.cz> <20180716162337.GY17280@dhcp22.suse.cz> <20180716164500.GZ17280@dhcp22.suse.cz> From: Vlastimil Babka Message-ID: Date: Fri, 27 Jul 2018 13:15:33 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Marinko Catovic , linux-mm@kvack.org On 07/21/2018 12:03 AM, Marinko Catovic wrote: > I let this run for 3 days now, so it is quite a lot, there you go: > https://nofile.io/f/egGyRjf0NPs/vmstat.tar.gz The stats show that compaction has very bad results. Between first and last snapshot, compact_fail grew by 80k and compact_success by 1300. High-order allocations will thus cycle between (failing) compaction and reclaim that removes the buffer/caches from memory. Since dropping slab caches helps, I suspect it's either the slab pages (which cannot be migrated for compaction) being spread over all memory, making it impossible to assemble high-order pages, or some slab objects are pinning file pages making them also impossible to be migrated. > There is one thing I forgot to mention: the hosts perform find and du (I > mean the commands, finding files and disk usage) > on the HDDs every night, starting from 00:20 AM up until in the morning > 07:45 AM, for maintenance and stats. > > During this period the buffers/caches raise again as you may see from > the logs, so find/du do fill them. > Nevertheless as the day passes both decrease again until low values are > reached. > I disabled find/du for the night on 19->20th July to compare. > > I have to say that this really low usage (300MB/xGB) occured just once > after I upgraded from 4.16 to 4.17, not sure > why, where one can still see from the logs that the buffers/cache is not > using up the entire available RAM. > > This low usage occured the last time on that one host when I mentioned > that I had to 2>drop_caches again in my > previous message, so this is still an issue even on the latest kernel. > > The other host (the one that was not measured with the vmstat logs) has > currently 600MB/14GB, 34GB of free RAM. > Both were reset with drop_caches at the same time. From the looks of > this the really low usage will occur again > somewhat shortly, it just did not come up during measurement. However, > the RAM should be full anyway, true? Can you provide (a single snapshot) /proc/pagetypeinfo and /proc/slabinfo from a system that's currently experiencing the issue, also with /proc/vmstat and /proc/zoneinfo to verify? Thanks.