> Maybe a memcg with kmemcg limit? Michal could know more. Could you/Michael explain this perhaps? The hardware is pretty much high end datacenter grade, I really would not know how this is to be related with the hardware :( I do not understand why apparently the caching is working very much fine for the beginning after a drop_caches, then degrades to low usage somewhat later. I can not possibly drop caches automatically, since this requires monitoring for overload with temporary dropping traffic on specific ports until the writes/reads cool down. 2018-08-06 11:40 GMT+02:00 Vlastimil Babka : > On 08/03/2018 04:13 PM, Marinko Catovic wrote: > > Thanks for the analysis. > > > > So since I am no mem management dev, what exactly does this mean? > > Is there any way of workaround or quickfix or something that can/will > > be fixed at some point in time? > > Workaround would be the manual / periodic cache flushing, unfortunately. > > Maybe a memcg with kmemcg limit? Michal could know more. > > A long-term generic solution will be much harder to find :( > > > I can not imagine that I am the only one who is affected by this, nor do > I > > know why my use case would be so much different from any other. > > Most 'cloud' services should be affected as well. > > Hmm, either your workload is specific in being hungry for fs metadata > and not much data (page cache). And/Or there's some source of the > high-order allocations that others don't have, possibly related to some > piece of hardware? > > > Tell me if you need any other snapshots or whatever info. > > > > 2018-08-02 18:15 GMT+02:00 Vlastimil Babka > >: > > > > On 07/31/2018 12:08 AM, Marinko Catovic wrote: > > > > > >> Can you provide (a single snapshot) /proc/pagetypeinfo and > > >> /proc/slabinfo from a system that's currently experiencing the > issue, > > >> also with /proc/vmstat and /proc/zoneinfo to verify? Thanks. > > > > > > your request came in just one day after I 2>drop_caches again when > the > > > ram usage > > > was really really low again. Up until now it did not reoccur on > any of > > > the 2 hosts, > > > where one shows 550MB/11G with 37G of totally free ram for now - > so not > > > that low > > > like last time when I dropped it, I think it was like 300M/8G or > so, but > > > I hope it helps: > > > > Thanks. > > > > > /proc/pagetypeinfo https://pastebin.com/6QWEZagL > > > > Yep, looks like fragmented by reclaimable slabs: > > > > Node 0, zone Normal, type Unmovable 29101 32754 8372 > > 2790 1334 354 23 3 4 0 0 > > Node 0, zone Normal, type Movable 142449 83386 99426 > > 69177 36761 12931 1378 24 0 0 0 > > Node 0, zone Normal, type Reclaimable 467195 530638 355045 > > 192638 80358 15627 2029 231 18 0 0 > > > > Number of blocks type Unmovable Movable Reclaimable > > HighAtomic Isolate > > Node 0, zone DMA 1 7 0 > > 0 0 > > Node 0, zone DMA32 34 703 375 > > 0 0 > > Node 0, zone Normal 1672 14276 15659 > > 1 0 > > > > Half of the memory is marked as reclaimable (2 megabyte) pageblocks. > > zoneinfo has nr_slab_reclaimable 1679817 so the reclaimable slabs > occupy > > only 3280 (6G) pageblocks, yet they are spread over 5 times as much. > > It's also possible they pollute the Movable pageblocks as well, but > the > > stats can't tell us. Either the page grouping mobility heuristics are > > broken here, or the worst case scenario happened - memory was at > > some point > > really wholly filled with reclaimable slabs, and the rather random > > reclaim > > did not result in whole pageblocks being freed. > > > > > /proc/slabinfo https://pastebin.com/81QAFgke > > > > Largest caches seem to be: > > # name > > : tunables : > > slabdata > > ext4_inode_cache 3107754 3759573 1080 3 1 : tunables 24 > > 12 8 : slabdata 1253191 1253191 0 > > dentry 2840237 7328181 192 21 1 : tunables 120 > > 60 8 : slabdata 348961 348961 120 > > > > The internal framentation of dentry cache is significant as well. > > Dunno if some of those objects pin movable pages as well... > > > > So looks like there's insufficient slab reclaim (shrinker activity), > and > > possibly problems with page grouping by mobility heuristics as > well... > > > > > /proc/vmstat https://pastebin.com/S7mrQx1s > > > /proc/zoneinfo https://pastebin.com/csGeqNyX > > > > > > also please note - whether this makes any difference: there is no > swap > > > file/partition > > > I am using this without swap space. imho this should not be > > necessary since > > > applications running on the hosts would not consume more than > > 20GB, the rest > > > should be used by buffers/cache. > > > > > > > > >