> It might be also interesting to do in the problematic state, instead of
> dropping caches:
>
> - save snapshot of /proc/vmstat and /proc/pagetypeinfo
> - echo 1 > /proc/sys/vm/compact_memory
> - save new snapshot of /proc/vmstat and /proc/pagetypeinfo

There was just a worstcase in progress, about 100MB/10GB were used,
super-low perfomance, but could not see any improvement there after echo 1,
I watches this for about 3 minutes, the cache usage did not change.

pagetypeinfo before echo https://pastebin.com/MjSgiMRL
pagetypeinfo 3min after echo https://pastebin.com/uWM6xGDd

vmstat before echo https://pastebin.com/TjYSKNdE
vmstat 3min after echo https://pastebin.com/MqTibEKi

> Btw. vast majority of order-3 requests come from the network layer. Are
> you using a large MTU (jumbo packets)?

not that I know of, how would I figure that out?
I have not touched sysctl net.* besides a few values not related to mtu
afaik

> Btw. I was probably not specific enough. This data should be collected
> _during_ the time when the page cache is disappearing. I suspect you
> have started collecting after the fact.

meh, I just messed up that output with the latest drop_caches, but I am
pretty
much sure that the one you see is while the usage was like 300MB/10GB,
before drop caches.

I was thinking maybe it would really help if one of you guys links up with
the hosts
in that state so that you can see for yourself. due to privacy issues (gdpr
and stuff)
I'd like to monitor this, so the ssh login would have to go over something
like teamviewer
on my host or whatever. please let me know if anyone is willing, since I
really see
no help there with anything I tried for 3 months by now. thanks for the
efforts.
surely any diagnosis would be easier this way.