> Maybe a memcg with kmemcg limit? Michal could know more.

Could you/Michael explain this perhaps?

The hardware is pretty much high end datacenter grade, I really would
not know how this is to be related with the hardware :(

I do not understand why apparently the caching is working very much
fine for the beginning after a drop_caches, then degrades to low usage
somewhat later. I can not possibly drop caches automatically, since
this requires monitoring for overload with temporary dropping traffic
on specific ports until the writes/reads cool down.


2018-08-06 11:40 GMT+02:00 Vlastimil Babka <vbabka@suse.cz>:

> On 08/03/2018 04:13 PM, Marinko Catovic wrote:
> > Thanks for the analysis.
> >
> > So since I am no mem management dev, what exactly does this mean?
> > Is there any way of workaround or quickfix or something that can/will
> > be fixed at some point in time?
>
> Workaround would be the manual / periodic cache flushing, unfortunately.
>
> Maybe a memcg with kmemcg limit? Michal could know more.
>
> A long-term generic solution will be much harder to find :(
>
> > I can not imagine that I am the only one who is affected by this, nor do
> I
> > know why my use case would be so much different from any other.
> > Most 'cloud' services should be affected as well.
>
> Hmm, either your workload is specific in being hungry for fs metadata
> and not much data (page cache). And/Or there's some source of the
> high-order allocations that others don't have, possibly related to some
> piece of hardware?
>
> > Tell me if you need any other snapshots or whatever info.
> >
> > 2018-08-02 18:15 GMT+02:00 Vlastimil Babka <vbabka@suse.cz
> > <mailto:vbabka@suse.cz>>:
> >
> >     On 07/31/2018 12:08 AM, Marinko Catovic wrote:
> >     >
> >     >> Can you provide (a single snapshot) /proc/pagetypeinfo and
> >     >> /proc/slabinfo from a system that's currently experiencing the
> issue,
> >     >> also with /proc/vmstat and /proc/zoneinfo to verify? Thanks.
> >     >
> >     > your request came in just one day after I 2>drop_caches again when
> the
> >     > ram usage
> >     > was really really low again. Up until now it did not reoccur on
> any of
> >     > the 2 hosts,
> >     > where one shows 550MB/11G with 37G of totally free ram for now -
> so not
> >     > that low
> >     > like last time when I dropped it, I think it was like 300M/8G or
> so, but
> >     > I hope it helps:
> >
> >     Thanks.
> >
> >     > /proc/pagetypeinfo  https://pastebin.com/6QWEZagL
> >
> >     Yep, looks like fragmented by reclaimable slabs:
> >
> >     Node    0, zone   Normal, type    Unmovable  29101  32754   8372
> >      2790   1334    354     23      3      4      0      0
> >     Node    0, zone   Normal, type      Movable 142449  83386  99426
> >     69177  36761  12931   1378     24      0      0      0
> >     Node    0, zone   Normal, type  Reclaimable 467195 530638 355045
> >     192638  80358  15627   2029    231     18      0      0
> >
> >     Number of blocks type     Unmovable      Movable  Reclaimable
> >      HighAtomic      Isolate
> >     Node 0, zone      DMA            1            7            0
> >         0            0
> >     Node 0, zone    DMA32           34          703          375
> >         0            0
> >     Node 0, zone   Normal         1672        14276        15659
> >         1            0
> >
> >     Half of the memory is marked as reclaimable (2 megabyte) pageblocks.
> >     zoneinfo has nr_slab_reclaimable 1679817 so the reclaimable slabs
> occupy
> >     only 3280 (6G) pageblocks, yet they are spread over 5 times as much.
> >     It's also possible they pollute the Movable pageblocks as well, but
> the
> >     stats can't tell us. Either the page grouping mobility heuristics are
> >     broken here, or the worst case scenario happened - memory was at
> >     some point
> >     really wholly filled with reclaimable slabs, and the rather random
> >     reclaim
> >     did not result in whole pageblocks being freed.
> >
> >     > /proc/slabinfo  https://pastebin.com/81QAFgke
> >
> >     Largest caches seem to be:
> >     # name            <active_objs> <num_objs> <objsize> <objperslab>
> >     <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> :
> >     slabdata <active_slabs> <num_slabs> <sharedavail>
> >     ext4_inode_cache  3107754 3759573   1080    3    1 : tunables   24
> >      12    8 : slabdata 1253191 1253191      0
> >     dentry            2840237 7328181    192   21    1 : tunables  120
> >      60    8 : slabdata 348961 348961    120
> >
> >     The internal framentation of dentry cache is significant as well.
> >     Dunno if some of those objects pin movable pages as well...
> >
> >     So looks like there's insufficient slab reclaim (shrinker activity),
> and
> >     possibly problems with page grouping by mobility heuristics as
> well...
> >
> >     > /proc/vmstat  https://pastebin.com/S7mrQx1s
> >     > /proc/zoneinfo  https://pastebin.com/csGeqNyX
> >     >
> >     > also please note - whether this makes any difference: there is no
> swap
> >     > file/partition
> >     > I am using this without swap space. imho this should not be
> >     necessary since
> >     > applications running on the hosts would not consume more than
> >     20GB, the rest
> >     > should be used by buffers/cache.
> >     >
> >
> >
>
>