From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by kanga.kvack.org (Postfix) with ESMTP id 0C5486B0266 for ; Mon, 6 Aug 2018 08:00:45 -0400 (EDT) Received: by mail-ed1-f70.google.com with SMTP id t24-v6so1914428edq.13 for ; Mon, 06 Aug 2018 05:00:44 -0700 (PDT) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id w33-v6si11138250edd.30.2018.08.06.05.00.43 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 Aug 2018 05:00:43 -0700 (PDT) Date: Mon, 6 Aug 2018 14:00:42 +0200 From: Michal Hocko Subject: Re: Caching/buffers become useless after some time Message-ID: <20180806120042.GL19540@dhcp22.suse.cz> References: <20180716164500.GZ17280@dhcp22.suse.cz> <20180730144048.GW24267@dhcp22.suse.cz> <1f862d41-1e9f-5324-fb90-b43f598c3955@suse.cz> <30f7ec9a-e090-06f1-1851-b18b3214f5e3@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Marinko Catovic Cc: Vlastimil Babka , linux-mm@kvack.org [Please do not top-post] On Mon 06-08-18 12:29:43, Marinko Catovic wrote: > > Maybe a memcg with kmemcg limit? Michal could know more. > > Could you/Michael explain this perhaps? The only way how kmemcg limit could help I can think of would be to enforce metadata reclaim much more often. But that is rather a bad workaround. > The hardware is pretty much high end datacenter grade, I really would > not know how this is to be related with the hardware :( Well, there are some drivers (mostly out-of-tree) which are high order hungry. You can try to trace all allocations which with order > 0 and see who that might be. # mount -t tracefs none /debug/trace/ # echo stacktrace > /debug/trace/trace_options # echo "order>0" > /debug/trace/events/kmem/mm_page_alloc/filter # echo 1 > /debug/trace/events/kmem/mm_page_alloc/enable # cat /debug/trace/trace_pipe And later this to disable tracing. # echo 0 > /debug/trace/events/kmem/mm_page_alloc/enable > I do not understand why apparently the caching is working very much > fine for the beginning after a drop_caches, then degrades to low usage > somewhat later. Because a lot of FS metadata is fragmenting the memory and a large number of high order allocations which want to be served reclaim a lot of memory to achieve their gol. Considering a large part of memory is fragmented by unmovable objects there is no other way than to use reclaim to release that memory. > I can not possibly drop caches automatically, since > this requires monitoring for overload with temporary dropping traffic > on specific ports until the writes/reads cool down. You do not have to drop all caches. echo 2 > /proc/sys/vm/drop_caches should be sufficient to drop metadata only. -- Michal Hocko SUSE Labs