* OOM killer not nearly agressive enough? @ 2020-01-07 20:44 Pavel Machek 2020-01-09 11:56 ` Michal Hocko 0 siblings, 1 reply; 12+ messages in thread From: Pavel Machek @ 2020-01-07 20:44 UTC (permalink / raw) To: kernel list, Andrew Morton, linux-mm, akpm [-- Attachment #1: Type: text/plain, Size: 639 bytes --] Hi! I updated my userspace to x86-64, and now chromium likes to eat all the memory and bring the system to standstill. Unfortunately, OOM killer does not react: I'm now running "ps aux", and it prints one line every 20 seconds or more. Do we agree that is "unusable" system? I attempted to do kill from other session. Do we agree that OOM killer should have reacted way sooner? Is there something I can tweak to make it behave more reasonably? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OOM killer not nearly agressive enough? 2020-01-07 20:44 OOM killer not nearly agressive enough? Pavel Machek @ 2020-01-09 11:56 ` Michal Hocko 2020-01-09 21:03 ` Pavel Machek 2020-01-09 21:05 ` Pavel Machek 0 siblings, 2 replies; 12+ messages in thread From: Michal Hocko @ 2020-01-09 11:56 UTC (permalink / raw) To: Pavel Machek; +Cc: kernel list, Andrew Morton, linux-mm, akpm On Tue 07-01-20 21:44:12, Pavel Machek wrote: > Hi! > > I updated my userspace to x86-64, and now chromium likes to eat all > the memory and bring the system to standstill. > > Unfortunately, OOM killer does not react: > > I'm now running "ps aux", and it prints one line every 20 seconds or > more. Do we agree that is "unusable" system? I attempted to do kill > from other session. Does sysrq+f help? > Do we agree that OOM killer should have reacted way sooner? This is impossible to answer without knowing what was going on at the time. Was the system threshing over page cache/swap? In other words, is the system completely out of memory or refaulting the working set all the time because it doesn't fit into memory? > Is there something I can tweak to make it behave more reasonably? PSI based early OOM killing might help. See https://github.com/facebookincubator/oomd -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OOM killer not nearly agressive enough? 2020-01-09 11:56 ` Michal Hocko @ 2020-01-09 21:03 ` Pavel Machek 2020-01-09 21:25 ` Michal Hocko 2020-01-09 21:46 ` Vito Caputo 2020-01-09 21:05 ` Pavel Machek 1 sibling, 2 replies; 12+ messages in thread From: Pavel Machek @ 2020-01-09 21:03 UTC (permalink / raw) To: Michal Hocko; +Cc: kernel list, Andrew Morton, linux-mm, akpm [-- Attachment #1: Type: text/plain, Size: 1541 bytes --] On Thu 2020-01-09 12:56:33, Michal Hocko wrote: > On Tue 07-01-20 21:44:12, Pavel Machek wrote: > > Hi! > > > > I updated my userspace to x86-64, and now chromium likes to eat all > > the memory and bring the system to standstill. > > > > Unfortunately, OOM killer does not react: > > > > I'm now running "ps aux", and it prints one line every 20 seconds or > > more. Do we agree that is "unusable" system? I attempted to do kill > > from other session. > > Does sysrq+f help? May try that next time. > > Do we agree that OOM killer should have reacted way sooner? > > This is impossible to answer without knowing what was going on at the > time. Was the system threshing over page cache/swap? In other words, is > the system completely out of memory or refaulting the working set all > the time because it doesn't fit into memory? Swap was full, so "completely out of memory", I guess. Chromium does that fairly often :-(. > > Is there something I can tweak to make it behave more reasonably? > > PSI based early OOM killing might help. See https://github.com/facebookincubator/oomd Um. Before doing that... is there some knob somewhere saying "hey oomkiller, one hour to recover machine is a bit too much, can you please react sooner"? PSI is completely different system, but I guess I should attempt to tweak the existing one first... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OOM killer not nearly agressive enough? 2020-01-09 21:03 ` Pavel Machek @ 2020-01-09 21:25 ` Michal Hocko 2020-01-09 22:48 ` Pavel Machek 2020-01-09 21:46 ` Vito Caputo 1 sibling, 1 reply; 12+ messages in thread From: Michal Hocko @ 2020-01-09 21:25 UTC (permalink / raw) To: Pavel Machek; +Cc: kernel list, Andrew Morton, linux-mm, akpm On Thu 09-01-20 22:03:07, Pavel Machek wrote: > On Thu 2020-01-09 12:56:33, Michal Hocko wrote: > > On Tue 07-01-20 21:44:12, Pavel Machek wrote: > > > Hi! > > > > > > I updated my userspace to x86-64, and now chromium likes to eat all > > > the memory and bring the system to standstill. > > > > > > Unfortunately, OOM killer does not react: > > > > > > I'm now running "ps aux", and it prints one line every 20 seconds or > > > more. Do we agree that is "unusable" system? I attempted to do kill > > > from other session. > > > > Does sysrq+f help? > > May try that next time. > > > > Do we agree that OOM killer should have reacted way sooner? > > > > This is impossible to answer without knowing what was going on at the > > time. Was the system threshing over page cache/swap? In other words, is > > the system completely out of memory or refaulting the working set all > > the time because it doesn't fit into memory? > > Swap was full, so "completely out of memory", I guess. Chromium does > that fairly often :-(. The oom heuristic is based on the reclaim failure. If the reclaim makes some progress then the oom killer is not hit. Have a look at should_reclaim_retry for more details. > > > Is there something I can tweak to make it behave more reasonably? > > > > PSI based early OOM killing might help. See https://github.com/facebookincubator/oomd > > Um. Before doing that... is there some knob somewhere saying "hey > oomkiller, one hour to recover machine is a bit too much, can you > please react sooner"? No, there is nothing like that. > PSI is completely different system, but I guess > I should attempt to tweak the existing one first... PSI is measuring the cost of the allocation (among other things) and that can give you some idea on how much time is spent to get memory. Userspace can implement a policy based on that and act. The kernel oom killer is the last resort when there is really no memory to allocate. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OOM killer not nearly agressive enough? 2020-01-09 21:25 ` Michal Hocko @ 2020-01-09 22:48 ` Pavel Machek 2020-01-10 1:24 ` Shakeel Butt 2020-01-10 6:31 ` Michal Hocko 0 siblings, 2 replies; 12+ messages in thread From: Pavel Machek @ 2020-01-09 22:48 UTC (permalink / raw) To: Michal Hocko; +Cc: kernel list, Andrew Morton, linux-mm, akpm [-- Attachment #1: Type: text/plain, Size: 2465 bytes --] Hi! > > > > Do we agree that OOM killer should have reacted way sooner? > > > > > > This is impossible to answer without knowing what was going on at the > > > time. Was the system threshing over page cache/swap? In other words, is > > > the system completely out of memory or refaulting the working set all > > > the time because it doesn't fit into memory? > > > > Swap was full, so "completely out of memory", I guess. Chromium does > > that fairly often :-(. > > The oom heuristic is based on the reclaim failure. If the reclaim makes > some progress then the oom killer is not hit. Have a look at > should_reclaim_retry for more details. Thanks for pointer. I guess setting MAX_RECLAIM_RETRIES to 1 is not something you'd recommend? :-). > > PSI is completely different system, but I guess > > I should attempt to tweak the existing one first... > > PSI is measuring the cost of the allocation (among other things) and > that can give you some idea on how much time is spent to get memory. > Userspace can implement a policy based on that and act. The kernel oom > killer is the last resort when there is really no memory to > allocate. So what I'm seeing is system that is unresponsive, easily for an hour. Sometimes, I'm able to log in. When I could do that, system was absurdly slow, like ps printing at more than 10 seconds per line. ps on my system takes 300msec, estimate in the slow case would be 2000 seconds, that is slowdown by factor of 6000x. That would be X terminal opening in like two hours... that's not really usable. DRAM is in 100nsec range, disk is in 10msec range; so worst case slowdown is somewhere in 100000x range. (Actually, in the worst case userland will do no progress at all, since you can need at 4+ pages in single CPU instruction, right?) But kernel is happy; system is unusable and will stay unusable for hour or more, and there's not much user can do. (Besides sysrq, thanks for the hint). Can we do better? This is equivalent of system crash, and it is _way_ too easy to trigger. Should we do better by default? Dunno. If user moved the mouse, and cursor did not move for 10 seconds, perhaps it is time for oom kill? Or should I add more swap? Is it terrible to place swap on SSD? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OOM killer not nearly agressive enough? 2020-01-09 22:48 ` Pavel Machek @ 2020-01-10 1:24 ` Shakeel Butt 2020-01-10 6:31 ` Michal Hocko 1 sibling, 0 replies; 12+ messages in thread From: Shakeel Butt @ 2020-01-10 1:24 UTC (permalink / raw) To: Pavel Machek Cc: Michal Hocko, kernel list, Andrew Morton, Linux MM, Andrew Morton On Thu, Jan 9, 2020 at 2:49 PM Pavel Machek <pavel@ucw.cz> wrote: > > Hi! > > > > > > Do we agree that OOM killer should have reacted way sooner? > > > > > > > > This is impossible to answer without knowing what was going on at the > > > > time. Was the system threshing over page cache/swap? In other words, is > > > > the system completely out of memory or refaulting the working set all > > > > the time because it doesn't fit into memory? > > > > > > Swap was full, so "completely out of memory", I guess. Chromium does > > > that fairly often :-(. > > > > The oom heuristic is based on the reclaim failure. If the reclaim makes > > some progress then the oom killer is not hit. Have a look at > > should_reclaim_retry for more details. > > Thanks for pointer. > > I guess setting MAX_RECLAIM_RETRIES to 1 is not something you'd > recommend? :-). > > > > PSI is completely different system, but I guess > > > I should attempt to tweak the existing one first... > > > > PSI is measuring the cost of the allocation (among other things) and > > that can give you some idea on how much time is spent to get memory. > > Userspace can implement a policy based on that and act. The kernel oom > > killer is the last resort when there is really no memory to > > allocate. > > So what I'm seeing is system that is unresponsive, easily for an hour. > > Sometimes, I'm able to log in. When I could do that, system was > absurdly slow, like ps printing at more than 10 seconds per line. > ps on my system takes 300msec, estimate in the slow case would be 2000 > seconds, that is slowdown by factor of 6000x. That would be X terminal > opening in like two hours... that's not really usable. > > DRAM is in 100nsec range, disk is in 10msec range; so worst case > slowdown is somewhere in 100000x range. (Actually, in the worst case > userland will do no progress at all, since you can need at 4+ pages in > single CPU instruction, right?) > > But kernel is happy; system is unusable and will stay unusable for > hour or more, and there's not much user can do. (Besides sysrq, thanks > for the hint). > > Can we do better? This is equivalent of system crash, and it is _way_ > too easy to trigger. Should we do better by default? > > Dunno. If user moved the mouse, and cursor did not move for 10 > seconds, perhaps it is time for oom kill? > > Or should I add more swap? Is it terrible to place swap on SSD? > What's the kernel version? How much memory is anon and file pages? What's your swap to DRAM ratio? Are you using in-memory compression based swap? Have you tried to disable swap completely? Shakeel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OOM killer not nearly agressive enough? @ 2020-01-10 1:24 ` Shakeel Butt 0 siblings, 0 replies; 12+ messages in thread From: Shakeel Butt @ 2020-01-10 1:24 UTC (permalink / raw) To: Pavel Machek Cc: Michal Hocko, kernel list, Andrew Morton, Linux MM, Andrew Morton On Thu, Jan 9, 2020 at 2:49 PM Pavel Machek <pavel@ucw.cz> wrote: > > Hi! > > > > > > Do we agree that OOM killer should have reacted way sooner? > > > > > > > > This is impossible to answer without knowing what was going on at the > > > > time. Was the system threshing over page cache/swap? In other words, is > > > > the system completely out of memory or refaulting the working set all > > > > the time because it doesn't fit into memory? > > > > > > Swap was full, so "completely out of memory", I guess. Chromium does > > > that fairly often :-(. > > > > The oom heuristic is based on the reclaim failure. If the reclaim makes > > some progress then the oom killer is not hit. Have a look at > > should_reclaim_retry for more details. > > Thanks for pointer. > > I guess setting MAX_RECLAIM_RETRIES to 1 is not something you'd > recommend? :-). > > > > PSI is completely different system, but I guess > > > I should attempt to tweak the existing one first... > > > > PSI is measuring the cost of the allocation (among other things) and > > that can give you some idea on how much time is spent to get memory. > > Userspace can implement a policy based on that and act. The kernel oom > > killer is the last resort when there is really no memory to > > allocate. > > So what I'm seeing is system that is unresponsive, easily for an hour. > > Sometimes, I'm able to log in. When I could do that, system was > absurdly slow, like ps printing at more than 10 seconds per line. > ps on my system takes 300msec, estimate in the slow case would be 2000 > seconds, that is slowdown by factor of 6000x. That would be X terminal > opening in like two hours... that's not really usable. > > DRAM is in 100nsec range, disk is in 10msec range; so worst case > slowdown is somewhere in 100000x range. (Actually, in the worst case > userland will do no progress at all, since you can need at 4+ pages in > single CPU instruction, right?) > > But kernel is happy; system is unusable and will stay unusable for > hour or more, and there's not much user can do. (Besides sysrq, thanks > for the hint). > > Can we do better? This is equivalent of system crash, and it is _way_ > too easy to trigger. Should we do better by default? > > Dunno. If user moved the mouse, and cursor did not move for 10 > seconds, perhaps it is time for oom kill? > > Or should I add more swap? Is it terrible to place swap on SSD? > What's the kernel version? How much memory is anon and file pages? What's your swap to DRAM ratio? Are you using in-memory compression based swap? Have you tried to disable swap completely? Shakeel ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OOM killer not nearly agressive enough? 2020-01-09 22:48 ` Pavel Machek 2020-01-10 1:24 ` Shakeel Butt @ 2020-01-10 6:31 ` Michal Hocko 1 sibling, 0 replies; 12+ messages in thread From: Michal Hocko @ 2020-01-10 6:31 UTC (permalink / raw) To: Pavel Machek; +Cc: kernel list, Andrew Morton, linux-mm, akpm On Thu 09-01-20 23:48:45, Pavel Machek wrote: > Hi! > > > > > > Do we agree that OOM killer should have reacted way sooner? > > > > > > > > This is impossible to answer without knowing what was going on at the > > > > time. Was the system threshing over page cache/swap? In other words, is > > > > the system completely out of memory or refaulting the working set all > > > > the time because it doesn't fit into memory? > > > > > > Swap was full, so "completely out of memory", I guess. Chromium does > > > that fairly often :-(. > > > > The oom heuristic is based on the reclaim failure. If the reclaim makes > > some progress then the oom killer is not hit. Have a look at > > should_reclaim_retry for more details. > > Thanks for pointer. > > I guess setting MAX_RECLAIM_RETRIES to 1 is not something you'd > recommend? :-). You can certainly play with that. I am not overly optimistic that would help though because symptoms of a threshing system is that we actually do not even reach this point. Pages are simply recycled but they evict other part of the hot working set. But I am only guessing what is the problem in your case. Anyway MAX_RECLAIM_RETRIES would tend to be more timing sensitive in general. If the reclaim progress cannot be made because of IO latencies or other resource depletion then the OOM be declared too early. The current MAX_RECLAIM_RETRIES is not something we have tuned for in any sense. I remember it didn't make much difference to change it unless the number would be really high which would be signal that the reclaim is not throttled very well. > > > PSI is completely different system, but I guess > > > I should attempt to tweak the existing one first... > > > > PSI is measuring the cost of the allocation (among other things) and > > that can give you some idea on how much time is spent to get memory. > > Userspace can implement a policy based on that and act. The kernel oom > > killer is the last resort when there is really no memory to > > allocate. > > So what I'm seeing is system that is unresponsive, easily for an hour. > > Sometimes, I'm able to log in. When I could do that, system was > absurdly slow, like ps printing at more than 10 seconds per line. > ps on my system takes 300msec, estimate in the slow case would be 2000 > seconds, that is slowdown by factor of 6000x. That would be X terminal > opening in like two hours... that's not really usable. It would be great to find out what is the bottle neck. Is the allocator stuck in the memory reclaim? Waiting on some lock? Reclaiming pages which are stolen by other contending processes? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OOM killer not nearly agressive enough? 2020-01-09 21:03 ` Pavel Machek 2020-01-09 21:25 ` Michal Hocko @ 2020-01-09 21:46 ` Vito Caputo 2020-01-09 21:58 ` Michal Hocko 1 sibling, 1 reply; 12+ messages in thread From: Vito Caputo @ 2020-01-09 21:46 UTC (permalink / raw) To: Pavel Machek; +Cc: Michal Hocko, kernel list, Andrew Morton, linux-mm, akpm On Thu, Jan 09, 2020 at 10:03:07PM +0100, Pavel Machek wrote: > On Thu 2020-01-09 12:56:33, Michal Hocko wrote: > > On Tue 07-01-20 21:44:12, Pavel Machek wrote: > > > Hi! > > > > > > I updated my userspace to x86-64, and now chromium likes to eat all > > > the memory and bring the system to standstill. > > > > > > Unfortunately, OOM killer does not react: > > > > > > I'm now running "ps aux", and it prints one line every 20 seconds or > > > more. Do we agree that is "unusable" system? I attempted to do kill > > > from other session. > > > > Does sysrq+f help? > > May try that next time. > > > > Do we agree that OOM killer should have reacted way sooner? > > > > This is impossible to answer without knowing what was going on at the > > time. Was the system threshing over page cache/swap? In other words, is > > the system completely out of memory or refaulting the working set all > > the time because it doesn't fit into memory? > > Swap was full, so "completely out of memory", I guess. Chromium does > that fairly often :-(. > Have you considered restricting its memory limits a la `ulimit -m`? I've taken to running browsers in nspawn containers for general isolation improvements, but this also makes it easy to set cgroup resource limits like memcg. i.e. --property MemoryMax=2G This prevents the browser from bogging down the entire system, but it doesn't prevent thrashing before FF OOMs within its control group. I do feel there's a problem with the kernel's reclaim algorithm, it seems far too willing to evict file-backed pages that are recently in use. But at least with memcg this behavior is isolated to the cgroup, though it still generates a crapload of disk reads from all the thrashing. Regards, Vito Caputo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OOM killer not nearly agressive enough? 2020-01-09 21:46 ` Vito Caputo @ 2020-01-09 21:58 ` Michal Hocko 0 siblings, 0 replies; 12+ messages in thread From: Michal Hocko @ 2020-01-09 21:58 UTC (permalink / raw) To: Vito Caputo; +Cc: Pavel Machek, kernel list, Andrew Morton, linux-mm, akpm On Thu 09-01-20 13:46:04, Vito Caputo wrote: > On Thu, Jan 09, 2020 at 10:03:07PM +0100, Pavel Machek wrote: > > On Thu 2020-01-09 12:56:33, Michal Hocko wrote: > > > On Tue 07-01-20 21:44:12, Pavel Machek wrote: > > > > Hi! > > > > > > > > I updated my userspace to x86-64, and now chromium likes to eat all > > > > the memory and bring the system to standstill. > > > > > > > > Unfortunately, OOM killer does not react: > > > > > > > > I'm now running "ps aux", and it prints one line every 20 seconds or > > > > more. Do we agree that is "unusable" system? I attempted to do kill > > > > from other session. > > > > > > Does sysrq+f help? > > > > May try that next time. > > > > > > Do we agree that OOM killer should have reacted way sooner? > > > > > > This is impossible to answer without knowing what was going on at the > > > time. Was the system threshing over page cache/swap? In other words, is > > > the system completely out of memory or refaulting the working set all > > > the time because it doesn't fit into memory? > > > > Swap was full, so "completely out of memory", I guess. Chromium does > > that fairly often :-(. > > > > Have you considered restricting its memory limits a la `ulimit -m`? The kernel ignores RLIMIT_RSS. Unless the browser takes it into consideration then I do not see how that would help. > I've taken to running browsers in nspawn containers for general > isolation improvements, but this also makes it easy to set cgroup > resource limits like memcg. i.e. --property MemoryMax=2G Yes, this should help to isolate the problem. > This prevents the browser from bogging down the entire system, but it > doesn't prevent thrashing before FF OOMs within its control group. > > I do feel there's a problem with the kernel's reclaim algorithm, it > seems far too willing to evict file-backed pages that are recently in > use. It is true that the memory reclaim is quite page cache reclaim biased unless there is very small amount of the page cache. Page cache refault is considered during the reclaim but I am afraid that there are still corner cases where the workload might end up threshing. Be it on the page cache or the anonymous memory depending on the workload. Anyway getting data from real workloads is always good so that we can think on improving existing heuristics. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OOM killer not nearly agressive enough? 2020-01-09 11:56 ` Michal Hocko 2020-01-09 21:03 ` Pavel Machek @ 2020-01-09 21:05 ` Pavel Machek 2020-01-09 21:28 ` Michal Hocko 1 sibling, 1 reply; 12+ messages in thread From: Pavel Machek @ 2020-01-09 21:05 UTC (permalink / raw) To: Michal Hocko; +Cc: kernel list, Andrew Morton, linux-mm, akpm [-- Attachment #1: Type: text/plain, Size: 1173 bytes --] On Thu 2020-01-09 12:56:33, Michal Hocko wrote: > On Tue 07-01-20 21:44:12, Pavel Machek wrote: > > Hi! > > > > I updated my userspace to x86-64, and now chromium likes to eat all > > the memory and bring the system to standstill. > > > > Unfortunately, OOM killer does not react: > > > > I'm now running "ps aux", and it prints one line every 20 seconds or > > more. Do we agree that is "unusable" system? I attempted to do kill > > from other session. > > Does sysrq+f help? > > > Do we agree that OOM killer should have reacted way sooner? > > This is impossible to answer without knowing what was going on at the > time. Was the system threshing over page cache/swap? In other words, is > the system completely out of memory or refaulting the working set all > the time because it doesn't fit into memory? What statistics are best to collect? Would the memory lines from top do the trick? I normally have gkrellm running, but I found its results hard to interpret. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: OOM killer not nearly agressive enough? 2020-01-09 21:05 ` Pavel Machek @ 2020-01-09 21:28 ` Michal Hocko 0 siblings, 0 replies; 12+ messages in thread From: Michal Hocko @ 2020-01-09 21:28 UTC (permalink / raw) To: Pavel Machek; +Cc: kernel list, Andrew Morton, linux-mm, akpm On Thu 09-01-20 22:05:36, Pavel Machek wrote: > On Thu 2020-01-09 12:56:33, Michal Hocko wrote: > > On Tue 07-01-20 21:44:12, Pavel Machek wrote: > > > Hi! > > > > > > I updated my userspace to x86-64, and now chromium likes to eat all > > > the memory and bring the system to standstill. > > > > > > Unfortunately, OOM killer does not react: > > > > > > I'm now running "ps aux", and it prints one line every 20 seconds or > > > more. Do we agree that is "unusable" system? I attempted to do kill > > > from other session. > > > > Does sysrq+f help? > > > > > Do we agree that OOM killer should have reacted way sooner? > > > > This is impossible to answer without knowing what was going on at the > > time. Was the system threshing over page cache/swap? In other words, is > > the system completely out of memory or refaulting the working set all > > the time because it doesn't fit into memory? > > What statistics are best to collect? Would the memory lines from top > do the trick? I normally have gkrellm running, but I found its results > hard to interpret. /proc/vmstat (and collecting it periodically) gives the most comprehensive picture about the state of MM. Interpreting numbers is far from trivial though. It requires to analyze multiple snapshots usually to see how the situation evolves. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-01-10 6:31 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-07 20:44 OOM killer not nearly agressive enough? Pavel Machek 2020-01-09 11:56 ` Michal Hocko 2020-01-09 21:03 ` Pavel Machek 2020-01-09 21:25 ` Michal Hocko 2020-01-09 22:48 ` Pavel Machek 2020-01-10 1:24 ` Shakeel Butt 2020-01-10 1:24 ` Shakeel Butt 2020-01-10 6:31 ` Michal Hocko 2020-01-09 21:46 ` Vito Caputo 2020-01-09 21:58 ` Michal Hocko 2020-01-09 21:05 ` Pavel Machek 2020-01-09 21:28 ` Michal Hocko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.