Hi everyone,
Recently, we meet an IO performance issue which caused by pagecache
thrashing in
a cgroup and we found it is introduced by commit 815744d75152 ("mm:
memcontrol:
don't batch updates of local VM stats and events").
The problem can easily reproduced in docker environment.
Firstly,create a container
with 4G memory limit and 2G swap limit, then run a program which
allocate (6G - 50M)
anon memory so there are only 50M memory can be used and no swap
space. Then
do "yum install gcc" and we can observed that the yum program is
thrashing and IO
keep high for a long but didn't trigger oom. This affects other
processes or containers
in the machine.
After analysis, we found there are large number of readahead
failures during this time.
Since page allocation from pagecache readahead have __GFP_NORETRY
flag, the oom
will be skipped when reach memcg limit. The pagecache is repeatedly
allocated and
reclaimed, and the value of workset_refault_file is high. These
readahead take a lot of
time, which consume a lot of IO throughput and impact the entire
system. This keeps
for long times until other page allocation trigger oom.
By bisection, we finally found
commit 815744d75152("mm: memcontrol: don't batch
updates of local VM stats and events"). Before the commit, the
process will trigger oom
in very short time. We suspect the difference is caused by
performance changes.
Is there any good way to fix the problem? we prefer the process to
be oom rather
than cause the system to be hung and affect other processes.
Thanks,