On 2018年07月18日 08:24, Marc MERLIN wrote: > On Wed, Jul 18, 2018 at 08:05:51AM +0800, Qu Wenruo wrote: >> No OOM triggers? That's a little strange. >> Maybe it's related to how kernel handles memory over-commit? > > Yes, I think you are correct. > >> And for the hang, I think it's related to some memory allocation failure >> and error handler just didn't handle it well, so it's causing deadlock >> for certain page. > > That indeed matches what I'm seeing. > >> ENOMEM handling is pretty common but hardly verified, so it's not that >> strange, but we must locate the problem. > > I seem to be getting deadlocks in the kernel, so I'm hoping that at least > it's checked there, but maybe not? > >> In my system, at least I'm not using btrfs as root fs, and for the >> memory eating program I normally ensure it's eating all the memory + >> swap, so OOM killer is always triggered, maybe that's the cause. >> >> So in your case, maybe it's btrfs not really taking up all memory, thus >> OOM killer not triggered. > > Correct, the swap is not used. > >> Any kernel dmesg about OOM killer triggered? > > Nothing at all. It never gets triggered. > >>> Here is my system when it virtually died: >>> ER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>> root 31006 21.2 90.7 29639020 29623180 pts/19 D+ 13:49 1:35 ./btrfs check /dev/mapper/dshelf2 > > See how btrs was taking 29GB in that ps output (that's before it takes > everything and I can't even type ps anymore) > Note that VSZ is almost equal to RSS. Nothing gets swapped. > > Then see free output: > >>> total used free shared buffers cached >>> Mem: 32643788 32180100 463688 0 44664 119508 >>> -/+ buffers/cache: 32015928 627860 >>> Swap: 15616764 443676 15173088 >> >> For swap, it looks like only some other program's memory is swapped out, >> not btrfs'. > > That's exactly correct. btrfs check never goes to swap, I'm not sure why, > and because there is virtual memory free, maybe that's why OOM does not > trigger? > So I guess I can probably "fix" my problem by removing swap, but ultimately > it would be useful to know why memory taken by btrfs check does not end up > in swap. > >> And unfortunately, I'm not so familiar with OOM/MM code outside of >> filesystem. >> Any help from other experienced developers would definitely help to >> solve why memory of 'btrfs check' is not swapped out or why OOM killer >> is not triggered. > > Do you have someone from linux-vm you might be able to ask, or should we Cc > this thread there? Michal Hocho gives me a brief session about this. Which is super helpful in this case, thank you Michal! Firstly, btrfs-progs usage of malloc() will result anonymous pages, thus they can be swapped out. Secondly, kernel doesn't like to swap out anonymous pages at all, thus kernel won't try to aggressively swap out such pages. Thirdly, for user anonymous memory, there is LRU-like algorithm to determine which memory should go swapped out. But considering how btrfs check uses pages, it would only make it harder to be swapped out. So it's not an easy way thing to aggressively swap out memory of btrfs check to swap. Although Michal mentioned some cgroup way to limit the memory usage so it can be more aggressively swapped out, I'm still digging into it. Thanks, Qu > > Thanks, > Marc >