* [RFC PATCH] mm, oom: disable dump_tasks by default @ 2019-09-03 14:45 Michal Hocko 2019-09-03 15:02 ` Qian Cai 2019-09-03 20:52 ` Tetsuo Handa 0 siblings, 2 replies; 14+ messages in thread From: Michal Hocko @ 2019-09-03 14:45 UTC (permalink / raw) To: linux-mm; +Cc: Andrew Morton, Tetsuo Handa, David Rientjes, LKML, Michal Hocko From: Michal Hocko <mhocko@suse.com> dump_tasks has been introduced by quite some time ago fef1bdd68c81 ("oom: add sysctl to enable task memory dump"). It's primary purpose is to help analyse oom victim selection decision. This has been certainly useful at times when the heuristic to chose a victim was much more volatile. Since a63d83f427fb ("oom: badness heuristic rewrite") situation became much more stable (mostly because the only selection criterion is the memory usage) and reports about a wrong process to be shot down have become effectively non-existent. dump_tasks can generate a lot of output to the kernel log. It is not uncommon that even relative small system has hundreds of tasks running. Generating a lot of output to the kernel log both makes the oom report less convenient to process and also induces a higher load on the printk subsystem which can lead to other problems (e.g. longer stalls to flush all the data to consoles). Therefore change the default of oom_dump_tasks to not print the task list by default. The sysctl remains in place for anybody who might need to get this additional information. The oom report still provides an information about the allocation context and the state of the MM subsystem which should be sufficient to analyse most of the oom situations. Signed-off-by: Michal Hocko <mhocko@suse.com> --- mm/oom_kill.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index eda2e2a0bdc6..d0353705c6e6 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -52,7 +52,7 @@ int sysctl_panic_on_oom; int sysctl_oom_kill_allocating_task; -int sysctl_oom_dump_tasks = 1; +int sysctl_oom_dump_tasks; /* * Serializes oom killer invocations (out_of_memory()) from all contexts to -- 2.20.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] mm, oom: disable dump_tasks by default 2019-09-03 14:45 [RFC PATCH] mm, oom: disable dump_tasks by default Michal Hocko @ 2019-09-03 15:02 ` Qian Cai 2019-09-03 15:13 ` Michal Hocko 2019-09-03 20:52 ` Tetsuo Handa 1 sibling, 1 reply; 14+ messages in thread From: Qian Cai @ 2019-09-03 15:02 UTC (permalink / raw) To: Michal Hocko, linux-mm Cc: Andrew Morton, Tetsuo Handa, David Rientjes, LKML, Michal Hocko On Tue, 2019-09-03 at 16:45 +0200, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > dump_tasks has been introduced by quite some time ago fef1bdd68c81 > ("oom: add sysctl to enable task memory dump"). It's primary purpose is > to help analyse oom victim selection decision. This has been certainly > useful at times when the heuristic to chose a victim was much more > volatile. Since a63d83f427fb ("oom: badness heuristic rewrite") > situation became much more stable (mostly because the only selection > criterion is the memory usage) and reports about a wrong process to > be shot down have become effectively non-existent. Well, I still see OOM sometimes kills wrong processes like ssh, systemd processes while LTP OOM tests with staight-forward allocation patterns. I just have not had a chance to debug them fully. The situation could be worse with more complex allocations like random stress or fuzzy testing. > > dump_tasks can generate a lot of output to the kernel log. It is not > uncommon that even relative small system has hundreds of tasks running. > Generating a lot of output to the kernel log both makes the oom report > less convenient to process and also induces a higher load on the printk > subsystem which can lead to other problems (e.g. longer stalls to flush > all the data to consoles). It is only generate output for the victim process where I tested on those large NUMA machines and the output is fairly manageable. > > Therefore change the default of oom_dump_tasks to not print the task > list by default. The sysctl remains in place for anybody who might need > to get this additional information. The oom report still provides an > information about the allocation context and the state of the MM > subsystem which should be sufficient to analyse most of the oom > situations. > > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > mm/oom_kill.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index eda2e2a0bdc6..d0353705c6e6 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -52,7 +52,7 @@ > > int sysctl_panic_on_oom; > int sysctl_oom_kill_allocating_task; > -int sysctl_oom_dump_tasks = 1; > +int sysctl_oom_dump_tasks; > > /* > * Serializes oom killer invocations (out_of_memory()) from all contexts to ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] mm, oom: disable dump_tasks by default 2019-09-03 15:02 ` Qian Cai @ 2019-09-03 15:13 ` Michal Hocko 2019-09-03 15:32 ` Qian Cai 2019-09-05 16:10 ` Qian Cai 0 siblings, 2 replies; 14+ messages in thread From: Michal Hocko @ 2019-09-03 15:13 UTC (permalink / raw) To: Qian Cai; +Cc: linux-mm, Andrew Morton, Tetsuo Handa, David Rientjes, LKML On Tue 03-09-19 11:02:46, Qian Cai wrote: > On Tue, 2019-09-03 at 16:45 +0200, Michal Hocko wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > dump_tasks has been introduced by quite some time ago fef1bdd68c81 > > ("oom: add sysctl to enable task memory dump"). It's primary purpose is > > to help analyse oom victim selection decision. This has been certainly > > useful at times when the heuristic to chose a victim was much more > > volatile. Since a63d83f427fb ("oom: badness heuristic rewrite") > > situation became much more stable (mostly because the only selection > > criterion is the memory usage) and reports about a wrong process to > > be shot down have become effectively non-existent. > > Well, I still see OOM sometimes kills wrong processes like ssh, systemd > processes while LTP OOM tests with staight-forward allocation patterns. Please report those. Most cases I have seen so far just turned out to work as expected and memory hogs just used oom_score_adj or similar. > I just > have not had a chance to debug them fully. The situation could be worse with > more complex allocations like random stress or fuzzy testing. Nothing really prevents enabling the sysctl when doing OOM oriented testing. > > dump_tasks can generate a lot of output to the kernel log. It is not > > uncommon that even relative small system has hundreds of tasks running. > > Generating a lot of output to the kernel log both makes the oom report > > less convenient to process and also induces a higher load on the printk > > subsystem which can lead to other problems (e.g. longer stalls to flush > > all the data to consoles). > > It is only generate output for the victim process where I tested on those large > NUMA machines and the output is fairly manageable. The main question here is whether that information is useful by _default_ because it is certainly not free. It takes both time to crawl all processes and cpu cycles to get that information to the console because printk is not free either. So if it more of "nice to have" than necessary for oom analysis then it should be disabled by default IMHO. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] mm, oom: disable dump_tasks by default 2019-09-03 15:13 ` Michal Hocko @ 2019-09-03 15:32 ` Qian Cai 2019-09-03 19:12 ` Michal Hocko 2019-09-05 16:10 ` Qian Cai 1 sibling, 1 reply; 14+ messages in thread From: Qian Cai @ 2019-09-03 15:32 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Andrew Morton, Tetsuo Handa, David Rientjes, LKML On Tue, 2019-09-03 at 17:13 +0200, Michal Hocko wrote: > On Tue 03-09-19 11:02:46, Qian Cai wrote: > > On Tue, 2019-09-03 at 16:45 +0200, Michal Hocko wrote: > > > From: Michal Hocko <mhocko@suse.com> > > > > > > dump_tasks has been introduced by quite some time ago fef1bdd68c81 > > > ("oom: add sysctl to enable task memory dump"). It's primary purpose is > > > to help analyse oom victim selection decision. This has been certainly > > > useful at times when the heuristic to chose a victim was much more > > > volatile. Since a63d83f427fb ("oom: badness heuristic rewrite") > > > situation became much more stable (mostly because the only selection > > > criterion is the memory usage) and reports about a wrong process to > > > be shot down have become effectively non-existent. > > > > Well, I still see OOM sometimes kills wrong processes like ssh, systemd > > processes while LTP OOM tests with staight-forward allocation patterns. > > Please report those. Most cases I have seen so far just turned out to > work as expected and memory hogs just used oom_score_adj or similar. > > > I just > > have not had a chance to debug them fully. The situation could be worse with > > more complex allocations like random stress or fuzzy testing. > > Nothing really prevents enabling the sysctl when doing OOM oriented > testing. > > > > dump_tasks can generate a lot of output to the kernel log. It is not > > > uncommon that even relative small system has hundreds of tasks running. > > > Generating a lot of output to the kernel log both makes the oom report > > > less convenient to process and also induces a higher load on the printk > > > subsystem which can lead to other problems (e.g. longer stalls to flush > > > all the data to consoles). > > > > It is only generate output for the victim process where I tested on those > > large > > NUMA machines and the output is fairly manageable. > > The main question here is whether that information is useful by > _default_ because it is certainly not free. It takes both time to crawl > all processes and cpu cycles to get that information to the console > because printk is not free either. So if it more of "nice to have" than > necessary for oom analysis then it should be disabled by default IMHO. It also feels like more a band-aid micro-optimization with the side-effect that affecting debuggability, as there could be loads of console output anyway during a kernel OOM event including failed allocation warnings. I suppose if you want to change the default behavior, the bar is high with more data and justification. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] mm, oom: disable dump_tasks by default 2019-09-03 15:32 ` Qian Cai @ 2019-09-03 19:12 ` Michal Hocko 0 siblings, 0 replies; 14+ messages in thread From: Michal Hocko @ 2019-09-03 19:12 UTC (permalink / raw) To: Qian Cai; +Cc: linux-mm, Andrew Morton, Tetsuo Handa, David Rientjes, LKML On Tue 03-09-19 11:32:58, Qian Cai wrote: > On Tue, 2019-09-03 at 17:13 +0200, Michal Hocko wrote: > > On Tue 03-09-19 11:02:46, Qian Cai wrote: > > > On Tue, 2019-09-03 at 16:45 +0200, Michal Hocko wrote: > > > > From: Michal Hocko <mhocko@suse.com> > > > > > > > > dump_tasks has been introduced by quite some time ago fef1bdd68c81 > > > > ("oom: add sysctl to enable task memory dump"). It's primary purpose is > > > > to help analyse oom victim selection decision. This has been certainly > > > > useful at times when the heuristic to chose a victim was much more > > > > volatile. Since a63d83f427fb ("oom: badness heuristic rewrite") > > > > situation became much more stable (mostly because the only selection > > > > criterion is the memory usage) and reports about a wrong process to > > > > be shot down have become effectively non-existent. > > > > > > Well, I still see OOM sometimes kills wrong processes like ssh, systemd > > > processes while LTP OOM tests with staight-forward allocation patterns. > > > > Please report those. Most cases I have seen so far just turned out to > > work as expected and memory hogs just used oom_score_adj or similar. > > > > > I just > > > have not had a chance to debug them fully. The situation could be worse with > > > more complex allocations like random stress or fuzzy testing. > > > > Nothing really prevents enabling the sysctl when doing OOM oriented > > testing. > > > > > > dump_tasks can generate a lot of output to the kernel log. It is not > > > > uncommon that even relative small system has hundreds of tasks running. > > > > Generating a lot of output to the kernel log both makes the oom report > > > > less convenient to process and also induces a higher load on the printk > > > > subsystem which can lead to other problems (e.g. longer stalls to flush > > > > all the data to consoles). > > > > > > It is only generate output for the victim process where I tested on those > > > large > > > NUMA machines and the output is fairly manageable. > > > > The main question here is whether that information is useful by > > _default_ because it is certainly not free. It takes both time to crawl > > all processes and cpu cycles to get that information to the console > > because printk is not free either. So if it more of "nice to have" than > > necessary for oom analysis then it should be disabled by default IMHO. > > It also feels like more a band-aid micro-optimization with the side-effect that > affecting debuggability, as there could be loads of console output anyway during > a kernel OOM event including failed allocation warnings. I suppose if you want > to change the default behavior, the bar is high with more data and > justification. Any specific idea what that justification should be? Because this is not something that you could measure very easily. It is very subjective and far from black and white. And I am fully aware of that. Hence RFC. That is why we should apply some common sense and cost/benefit evaluation. The cost of an additional output should be quite clear. Now we can argue about the benefit. I argue that for an absolute majority of oom report I have seen throughout past many years the task list was the least useful information of the report. Sure I could go there and double check that the victim was selected as designed. In minority cases I could use that the task list to confirm that expected-to-be-victim had OOM_SCORE_ADJ_MIN for some reason and that was why something esle has been selected. Those configuration issues tend to be reproducible and are easier to debug with sysctl enabled. All that being said, arguments tend to weigh more towards IMO. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] mm, oom: disable dump_tasks by default 2019-09-03 15:13 ` Michal Hocko 2019-09-03 15:32 ` Qian Cai @ 2019-09-05 16:10 ` Qian Cai 1 sibling, 0 replies; 14+ messages in thread From: Qian Cai @ 2019-09-05 16:10 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, Andrew Morton, Tetsuo Handa, David Rientjes, LKML On Tue, 2019-09-03 at 17:13 +0200, Michal Hocko wrote: > On Tue 03-09-19 11:02:46, Qian Cai wrote: > > On Tue, 2019-09-03 at 16:45 +0200, Michal Hocko wrote: > > > From: Michal Hocko <mhocko@suse.com> > > > > > > dump_tasks has been introduced by quite some time ago fef1bdd68c81 > > > ("oom: add sysctl to enable task memory dump"). It's primary purpose is > > > to help analyse oom victim selection decision. This has been certainly > > > useful at times when the heuristic to chose a victim was much more > > > volatile. Since a63d83f427fb ("oom: badness heuristic rewrite") > > > situation became much more stable (mostly because the only selection > > > criterion is the memory usage) and reports about a wrong process to > > > be shot down have become effectively non-existent. > > > > Well, I still see OOM sometimes kills wrong processes like ssh, systemd > > processes while LTP OOM tests with staight-forward allocation patterns. > > Please report those. Most cases I have seen so far just turned out to > work as expected and memory hogs just used oom_score_adj or similar. Here is the one where oom01 should be one to be killed. [92598.855697][ T2588] Swap cache stats: add 105240923, delete 105250445, find 42196/101577 [92598.893970][ T2588] Free swap = 16383612kB [92598.913482][ T2588] Total swap = 16465916kB [92598.932938][ T2588] 7275091 pages RAM [92598.950212][ T2588] 0 pages HighMem/MovableOnly [92598.971539][ T2588] 1315554 pages reserved [92598.990698][ T2588] 16384 pages cma reserved [92599.010760][ T2588] Tasks state (memory values in pages): [92599.036265][ T2588] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [92599.080129][ T2588] [ 1662] 0 1662 29511 1034 290816 244 0 systemd- journal [92599.126163][ T2588] [ 2586] 998 2586 508086 0 368640 1838 0 polkitd [92599.168706][ T2588] [ 2587] 0 2587 52786 0 421888 500 0 sssd [92599.210082][ T2588] [ 2588] 0 2588 31223 0 139264 195 0 irqbalance [92599.255606][ T2588] [ 2589] 81 2589 18381 0 167936 217 -900 dbus- daemon [92599.303678][ T2588] [ 2590] 0 2590 97260 193 372736 573 0 NetworkManager [92599.348957][ T2588] [ 2594] 0 2594 95350 1 229376 758 0 rngd [92599.390216][ T2588] [ 2598] 995 2598 7364 0 94208 103 0 chronyd [92599.432447][ T2588] [ 2629] 0 2629 106234 399 442368 3836 0 tuned [92599.473950][ T2588] [ 2638] 0 2638 23604 0 212992 240 -1000 sshd [92599.515158][ T2588] [ 2642] 0 2642 10392 0 102400 138 0 rhsmcertd [92599.560435][ T2588] [ 2691] 0 2691 21877 0 208896 277 0 systemd- logind [92599.605035][ T2588] [ 2700] 0 2700 3916 0 69632 45 0 agetty [92599.646750][ T2588] [ 2705] 0 2705 23370 0 225280 393 0 systemd [92599.688063][ T2588] [ 2730] 0 2730 37063 0 294912 667 0 (sd-pam) [92599.729028][ T2588] [ 2922] 0 2922 9020 0 98304 232 0 crond [92599.769130][ T2588] [ 3036] 0 3036 37797 1 307200 305 0 sshd [92599.813768][ T2588] [ 3057] 0 3057 37797 0 303104 335 0 sshd [92599.853450][ T2588] [ 3065] 0 3065 6343 1 86016 163 0 bash [92599.892899][ T2588] [ 38249] 0 38249 58330 293 221184 246 0 rsyslogd [92599.934457][ T2588] [ 11329] 0 11329 55131 73 454656 396 0 sssd_nss [92599.976240][ T2588] [ 11331] 0 11331 54424 1 434176 610 0 sssd_be [92600.017106][ T2588] [ 25247] 0 25247 25746 1 212992 300 -1000 systemd-udevd [92600.060539][ T2588] [ 25391] 0 25391 2184 0 65536 32 0 oom01 [92600.100648][ T2588] [ 25392] 0 25392 2184 0 65536 39 0 oom01 [92600.143516][ T2588] oom- kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0- 1,global_oom,task_memcg=/system.slice/tuned.service,task=tuned,pid=2629,uid=0 [92600.213724][ T2588] Out of memory: Killed process 2629 (tuned) total- vm:424936kB, anon-rss:328kB, file-rss:1268kB, shmem-rss:0kB, UID:0 pgtables:442368kB oom_score_adj:0 [92600.297832][ T305] oom_reaper: reaped process 2629 (tuned), now anon- rss:0kB, file-rss:0kB, shmem-rss:0kB > > > I just > > have not had a chance to debug them fully. The situation could be worse with > > more complex allocations like random stress or fuzzy testing. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] mm, oom: disable dump_tasks by default 2019-09-03 14:45 [RFC PATCH] mm, oom: disable dump_tasks by default Michal Hocko 2019-09-03 15:02 ` Qian Cai @ 2019-09-03 20:52 ` Tetsuo Handa 2019-09-04 5:40 ` Michal Hocko 1 sibling, 1 reply; 14+ messages in thread From: Tetsuo Handa @ 2019-09-03 20:52 UTC (permalink / raw) To: Michal Hocko, linux-mm; +Cc: Andrew Morton, David Rientjes, LKML, Michal Hocko On 2019/09/03 23:45, Michal Hocko wrote: > It's primary purpose is > to help analyse oom victim selection decision. I disagree, for I use the process list for understanding what / how many processes are consuming what kind of memory (without crashing the system) for anomaly detection purpose. Although we can't dump memory consumed by e.g. file descriptors, disabling dump_tasks() loose that clue, and is problematic for me. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] mm, oom: disable dump_tasks by default 2019-09-03 20:52 ` Tetsuo Handa @ 2019-09-04 5:40 ` Michal Hocko 2019-09-04 20:04 ` David Rientjes 0 siblings, 1 reply; 14+ messages in thread From: Michal Hocko @ 2019-09-04 5:40 UTC (permalink / raw) To: Tetsuo Handa; +Cc: linux-mm, Andrew Morton, David Rientjes, LKML On Wed 04-09-19 05:52:43, Tetsuo Handa wrote: > On 2019/09/03 23:45, Michal Hocko wrote: > > It's primary purpose is > > to help analyse oom victim selection decision. > > I disagree, for I use the process list for understanding what / how many > processes are consuming what kind of memory (without crashing the system) > for anomaly detection purpose. Although we can't dump memory consumed by > e.g. file descriptors, disabling dump_tasks() loose that clue, and is > problematic for me. Does anything really prevent you from enabling this by sysctl though? Or do you claim that this is a general usage pattern and therefore the default change is not acceptable or do you want a changelog to be updated? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] mm, oom: disable dump_tasks by default 2019-09-04 5:40 ` Michal Hocko @ 2019-09-04 20:04 ` David Rientjes 2019-09-05 13:39 ` Tetsuo Handa 0 siblings, 1 reply; 14+ messages in thread From: David Rientjes @ 2019-09-04 20:04 UTC (permalink / raw) To: Michal Hocko; +Cc: Tetsuo Handa, linux-mm, Andrew Morton, LKML On Wed, 4 Sep 2019, Michal Hocko wrote: > > > It's primary purpose is > > > to help analyse oom victim selection decision. > > > > I disagree, for I use the process list for understanding what / how many > > processes are consuming what kind of memory (without crashing the system) > > for anomaly detection purpose. Although we can't dump memory consumed by > > e.g. file descriptors, disabling dump_tasks() loose that clue, and is > > problematic for me. > > Does anything really prevent you from enabling this by sysctl though? Or > do you claim that this is a general usage pattern and therefore the > default change is not acceptable or do you want a changelog to be > updated? > I think the motivation is that users don't want to need to reproduce an oom kill to figure out why: they want to be able to figure out which process had higher than normal memory usage. If oom is the normal case then they ceratinly have the ability to disable it by disabling the sysctl, but that seems like something better to opt-out of rather than need to opt-in to and reproduce. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] mm, oom: disable dump_tasks by default 2019-09-04 20:04 ` David Rientjes @ 2019-09-05 13:39 ` Tetsuo Handa 2019-09-05 14:08 ` Michal Hocko 0 siblings, 1 reply; 14+ messages in thread From: Tetsuo Handa @ 2019-09-05 13:39 UTC (permalink / raw) To: David Rientjes, Michal Hocko; +Cc: linux-mm, Andrew Morton, LKML On 2019/09/05 5:04, David Rientjes wrote: > On Wed, 4 Sep 2019, Michal Hocko wrote: > >>>> It's primary purpose is >>>> to help analyse oom victim selection decision. >>> >>> I disagree, for I use the process list for understanding what / how many >>> processes are consuming what kind of memory (without crashing the system) >>> for anomaly detection purpose. Although we can't dump memory consumed by >>> e.g. file descriptors, disabling dump_tasks() loose that clue, and is >>> problematic for me. >> >> Does anything really prevent you from enabling this by sysctl though? Or >> do you claim that this is a general usage pattern and therefore the >> default change is not acceptable or do you want a changelog to be >> updated? >> > > I think the motivation is that users don't want to need to reproduce an > oom kill to figure out why: they want to be able to figure out which > process had higher than normal memory usage. Right. Users can't reproduce an OOM kill to figure out why. Those who do failure analysis for users (e.g. technical staff at support center) need to figure out system's state when an OOM kill happened. And installing dynamic hooks like SystemTap / eBPF is hardly acceptable for users. What I don't like is that Michal refuses to solve OOM stalling problem, does not try to understand how difficult to avoid problems caused by thoughtless printk(), and instead recommending to disable oom_dump_tasks. There is nothing that prevents users from enabling oom_dump_tasks by sysctl. But that requires a solution for OOM stalling problem. Since I know how difficult to avoid problems caused by printk() flooding, I insist that we need "mm,oom: Defer dump_tasks() output." patch. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] mm, oom: disable dump_tasks by default 2019-09-05 13:39 ` Tetsuo Handa @ 2019-09-05 14:08 ` Michal Hocko 2019-09-06 10:46 ` Tetsuo Handa 0 siblings, 1 reply; 14+ messages in thread From: Michal Hocko @ 2019-09-05 14:08 UTC (permalink / raw) To: Tetsuo Handa; +Cc: David Rientjes, linux-mm, Andrew Morton, LKML On Thu 05-09-19 22:39:47, Tetsuo Handa wrote: [...] > There is nothing that prevents users from enabling oom_dump_tasks by sysctl. > But that requires a solution for OOM stalling problem. You can hardly remove stalling if you are not reducing the amount of output or get it into a different context. Whether the later is reasonable is another question but you are essentially losing "at the OOM event state". > Since I know how > difficult to avoid problems caused by printk() flooding, I insist that > we need "mm,oom: Defer dump_tasks() output." patch. insisting is not a way to cooperate. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] mm, oom: disable dump_tasks by default 2019-09-05 14:08 ` Michal Hocko @ 2019-09-06 10:46 ` Tetsuo Handa 2019-09-06 11:02 ` Michal Hocko 0 siblings, 1 reply; 14+ messages in thread From: Tetsuo Handa @ 2019-09-06 10:46 UTC (permalink / raw) To: Michal Hocko; +Cc: David Rientjes, linux-mm, Andrew Morton, LKML On 2019/09/05 23:08, Michal Hocko wrote: > On Thu 05-09-19 22:39:47, Tetsuo Handa wrote: > [...] >> There is nothing that prevents users from enabling oom_dump_tasks by sysctl. >> But that requires a solution for OOM stalling problem. > > You can hardly remove stalling if you are not reducing the amount of > output or get it into a different context. Whether the later is > reasonable is another question but you are essentially losing "at the > OOM event state". > I am not losing "at the OOM event state". Please find "struct oom_task_info" (for now) embedded into "struct task_struct" which holds "at the OOM event state". And my patch moves "printk() from dump_tasks()" from OOM context to WQ context. Thus, I do remove stalling by defer printing of "struct oom_task_info" until the OOM killer sends SIGKILL and the OOM reaper starts reclaiming memory. --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -639,6 +639,21 @@ struct wake_q_node { struct wake_q_node *next; }; +/* Memory usage and misc info as of invocation of OOM killer. */ +struct oom_task_info { + struct list_head list; + unsigned int seq; + char comm[TASK_COMM_LEN]; + pid_t pid; + uid_t uid; + pid_t tgid; + unsigned long total_vm; + unsigned long mm_rss; + unsigned long pgtables_bytes; + unsigned long swapents; + int score_adj; +}; + struct task_struct { #ifdef CONFIG_THREAD_INFO_IN_TASK /* @@ -1260,7 +1275,7 @@ struct task_struct { #ifdef CONFIG_MMU struct task_struct *oom_reaper_list; #endif - struct list_head oom_victim_list; + struct oom_task_info oom_task_info; #ifdef CONFIG_VMAP_STACK struct vm_struct *stack_vm_area; #endif ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] mm, oom: disable dump_tasks by default 2019-09-06 10:46 ` Tetsuo Handa @ 2019-09-06 11:02 ` Michal Hocko 2019-09-06 11:11 ` Tetsuo Handa 0 siblings, 1 reply; 14+ messages in thread From: Michal Hocko @ 2019-09-06 11:02 UTC (permalink / raw) To: Tetsuo Handa; +Cc: David Rientjes, linux-mm, Andrew Morton, LKML On Fri 06-09-19 19:46:10, Tetsuo Handa wrote: > On 2019/09/05 23:08, Michal Hocko wrote: > > On Thu 05-09-19 22:39:47, Tetsuo Handa wrote: > > [...] > >> There is nothing that prevents users from enabling oom_dump_tasks by sysctl. > >> But that requires a solution for OOM stalling problem. > > > > You can hardly remove stalling if you are not reducing the amount of > > output or get it into a different context. Whether the later is > > reasonable is another question but you are essentially losing "at the > > OOM event state". > > > > I am not losing "at the OOM event state". Please find "struct oom_task_info" > (for now) embedded into "struct task_struct" which holds "at the OOM event state". > > And my patch moves "printk() from dump_tasks()" from OOM context to WQ context. Workers might be blocked for unbound amount of time and so this information might be printed late. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH] mm, oom: disable dump_tasks by default 2019-09-06 11:02 ` Michal Hocko @ 2019-09-06 11:11 ` Tetsuo Handa 0 siblings, 0 replies; 14+ messages in thread From: Tetsuo Handa @ 2019-09-06 11:11 UTC (permalink / raw) To: Michal Hocko; +Cc: David Rientjes, linux-mm, Andrew Morton, LKML On 2019/09/06 20:02, Michal Hocko wrote: > On Fri 06-09-19 19:46:10, Tetsuo Handa wrote: >> On 2019/09/05 23:08, Michal Hocko wrote: >>> On Thu 05-09-19 22:39:47, Tetsuo Handa wrote: >>> [...] >>>> There is nothing that prevents users from enabling oom_dump_tasks by sysctl. >>>> But that requires a solution for OOM stalling problem. >>> >>> You can hardly remove stalling if you are not reducing the amount of >>> output or get it into a different context. Whether the later is >>> reasonable is another question but you are essentially losing "at the >>> OOM event state". >>> >> >> I am not losing "at the OOM event state". Please find "struct oom_task_info" >> (for now) embedded into "struct task_struct" which holds "at the OOM event state". >> >> And my patch moves "printk() from dump_tasks()" from OOM context to WQ context. > > Workers might be blocked for unbound amount of time and so this > information might be printed late. > Yes, but the OOM reaper will quickly reclaim memory. And if WQ is blocked, new WQ for processing this work will be created (because OOM situation is quickly solved). Nonetheless if your worry turns out to be a real problem, we can use a dedicated WQ or offload to e.g. the OOM reaper kernel thread. Anyway, such tuning is beyond the scope of my patch. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-09-06 11:11 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-03 14:45 [RFC PATCH] mm, oom: disable dump_tasks by default Michal Hocko 2019-09-03 15:02 ` Qian Cai 2019-09-03 15:13 ` Michal Hocko 2019-09-03 15:32 ` Qian Cai 2019-09-03 19:12 ` Michal Hocko 2019-09-05 16:10 ` Qian Cai 2019-09-03 20:52 ` Tetsuo Handa 2019-09-04 5:40 ` Michal Hocko 2019-09-04 20:04 ` David Rientjes 2019-09-05 13:39 ` Tetsuo Handa 2019-09-05 14:08 ` Michal Hocko 2019-09-06 10:46 ` Tetsuo Handa 2019-09-06 11:02 ` Michal Hocko 2019-09-06 11:11 ` Tetsuo Handa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).