On Aug 2, 2019, at 12:14 PM, Michal Hocko <mhocko@kernel.org> wrote:

On Fri 02-08-19 11:00:55, Masoud Sharbiani wrote:


On Aug 2, 2019, at 7:41 AM, Michal Hocko <mhocko@kernel.org> wrote:

On Fri 02-08-19 07:18:17, Masoud Sharbiani wrote:


On Aug 2, 2019, at 12:40 AM, Michal Hocko <mhocko@kernel.org> wrote:

On Thu 01-08-19 11:04:14, Masoud Sharbiani wrote:
Hey folks,
I’ve come across an issue that affects most of 4.19, 4.20 and 5.2 linux-stable kernels that has only been fixed in 5.3-rc1.
It was introduced by

29ef680 memcg, oom: move out_of_memory back to the charge path 

This commit shouldn't really change the OOM behavior for your particular
test case. It would have changed MAP_POPULATE behavior but your usage is
triggering the standard page fault path. The only difference with
29ef680 is that the OOM killer is invoked during the charge path rather
than on the way out of the page fault.

Anyway, I tried to run your test case in a loop and leaker always ends
up being killed as expected with 5.2. See the below oom report. There
must be something else going on. How much swap do you have on your
system?

I do not have swap defined. 

OK, I have retested with swap disabled and again everything seems to be
working as expected. The oom happens earlier because I do not have to
wait for the swap to get full.


In my tests (with the script provided), it only loops 11 iterations before hanging, and uttering the soft lockup message.


Which fs do you use to write the file that you mmap?

/dev/sda3 on / type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)

Part of the soft lockup path actually specifies that it is going through __xfs_filemap_fault():

Right, I have just missed that.

[...]

If I switch the backing file to a ext4 filesystem (separate hard drive), it OOMs.


If I switch the file used to /dev/zero, it OOMs: 

Todal sum was 0. Loop count is 11
Buffer is @ 0x7f2b66c00000
./test-script-devzero.sh: line 16:  3561 Killed                  ./leaker -p 10240 -c 100000


Or could you try to
simplify your test even further? E.g. does everything work as expected
when doing anonymous mmap rather than file backed one?

It also OOMs with MAP_ANON. 

Hope that helps.

It helps to focus more on the xfs reclaim path. Just to be sure, is
there any difference if you use cgroup v2? I do not expect to be but
just to be sure there are no v1 artifacts.

I was unable to use cgroups2. I’ve created the new control group, but the attempt to move a running process into it fails with ‘Device or resource busy’.

Masoud

-- 
Michal Hocko
SUSE Labs