> > trace data which starts _before_ the cache dropdown starts and while it > is decreasing should be the first step. Ideally along with /proc/vmstat > gathered at the same time. I am pretty sure you have some high order > memory consumer which forces the reclaim and we over reclaim. Last data > was not really conclusive as it didn't really captured the dropdown > IIRC. > with before you mean in a totally healthy state? as I can not tell when decreasing starts this would mean collecting data over days perhaps. however, I have no issue with that. As I do not want to miss anything that might help you, could you please provide the commands for all the data you require? one host is at a healthy state right now, I'd run that over there immediately.