* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
@ 2019-08-21 7:19 ` David Rientjes
0 siblings, 0 replies; 26+ messages in thread
From: David Rientjes @ 2019-08-21 7:19 UTC (permalink / raw)
To: Michal Hocko, Edward Chron
Cc: Andrew Morton, Roman Gushchin, Johannes Weiner, Tetsuo Handa,
Shakeel Butt, linux-mm, linux-kernel, colona
On Wed, 21 Aug 2019, Michal Hocko wrote:
> > vm.oom_dump_tasks is pretty useful, however, so it's curious why you
> > haven't left it enabled :/
>
> Because it generates a lot of output potentially. Think of a workload
> with too many tasks which is not uncommon.
Probably better to always print all the info for the victim so we don't
need to duplicate everything between dump_tasks() and dump_oom_summary().
Edward, how about this?
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -420,11 +420,17 @@ static int dump_task(struct task_struct *p, void *arg)
* State information includes task's pid, uid, tgid, vm size, rss,
* pgtables_bytes, swapents, oom_score_adj value, and name.
*/
-static void dump_tasks(struct oom_control *oc)
+static void dump_tasks(struct oom_control *oc, struct task_struct *victim)
{
pr_info("Tasks state (memory values in pages):\n");
pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n");
+ /* If vm.oom_dump_tasks is disabled, only show the victim */
+ if (!sysctl_oom_dump_tasks) {
+ dump_task(victim, oc);
+ return;
+ }
+
if (is_memcg_oom(oc))
mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
else {
@@ -465,8 +471,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
if (is_dump_unreclaim_slabs())
dump_unreclaimable_slab();
}
- if (sysctl_oom_dump_tasks)
- dump_tasks(oc);
+ if (p || sysctl_oom_dump_tasks)
+ dump_tasks(oc, p);
if (p)
dump_oom_summary(oc, p);
}
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
2019-08-21 7:19 ` David Rientjes
(?)
@ 2019-08-21 7:47 ` Michal Hocko
2019-08-21 23:12 ` Edward Chron
-1 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2019-08-21 7:47 UTC (permalink / raw)
To: David Rientjes
Cc: Edward Chron, Andrew Morton, Roman Gushchin, Johannes Weiner,
Tetsuo Handa, Shakeel Butt, linux-mm, linux-kernel, colona
On Wed 21-08-19 00:19:37, David Rientjes wrote:
> On Wed, 21 Aug 2019, Michal Hocko wrote:
>
> > > vm.oom_dump_tasks is pretty useful, however, so it's curious why you
> > > haven't left it enabled :/
> >
> > Because it generates a lot of output potentially. Think of a workload
> > with too many tasks which is not uncommon.
>
> Probably better to always print all the info for the victim so we don't
> need to duplicate everything between dump_tasks() and dump_oom_summary().
I believe that the motivation was to have a one line summary that is already
parsed by log consumers. And that is in __oom_kill_process one.
Also I do not think this patch improves things much for two reasons
at leasts a) it doesn't really give you the whole list of killed tasks
(this might be the whole memcg) and b) we already do have most important
information in __oom_kill_process. If something is missing there I do
not see a strong reason we cannot add it there. Like in this case.
> Edward, how about this?
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -420,11 +420,17 @@ static int dump_task(struct task_struct *p, void *arg)
> * State information includes task's pid, uid, tgid, vm size, rss,
> * pgtables_bytes, swapents, oom_score_adj value, and name.
> */
> -static void dump_tasks(struct oom_control *oc)
> +static void dump_tasks(struct oom_control *oc, struct task_struct *victim)
> {
> pr_info("Tasks state (memory values in pages):\n");
> pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n");
>
> + /* If vm.oom_dump_tasks is disabled, only show the victim */
> + if (!sysctl_oom_dump_tasks) {
> + dump_task(victim, oc);
> + return;
> + }
> +
> if (is_memcg_oom(oc))
> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> else {
> @@ -465,8 +471,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> if (is_dump_unreclaim_slabs())
> dump_unreclaimable_slab();
> }
> - if (sysctl_oom_dump_tasks)
> - dump_tasks(oc);
> + if (p || sysctl_oom_dump_tasks)
> + dump_tasks(oc, p);
> if (p)
> dump_oom_summary(oc, p);
> }
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
2019-08-21 7:47 ` Michal Hocko
@ 2019-08-21 23:12 ` Edward Chron
0 siblings, 0 replies; 26+ messages in thread
From: Edward Chron @ 2019-08-21 23:12 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, Andrew Morton, Roman Gushchin, Johannes Weiner,
Tetsuo Handa, Shakeel Butt, linux-mm, linux-kernel,
Ivan Delalande
On Wed, Aug 21, 2019 at 12:47 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 21-08-19 00:19:37, David Rientjes wrote:
> > On Wed, 21 Aug 2019, Michal Hocko wrote:
> >
> > > > vm.oom_dump_tasks is pretty useful, however, so it's curious why you
> > > > haven't left it enabled :/
> > >
> > > Because it generates a lot of output potentially. Think of a workload
> > > with too many tasks which is not uncommon.
> >
> > Probably better to always print all the info for the victim so we don't
> > need to duplicate everything between dump_tasks() and dump_oom_summary().
>
> I believe that the motivation was to have a one line summary that is already
> parsed by log consumers. And that is in __oom_kill_process one.
>
Yes the motivation was one line summary that the OOM Killed Process
message supplies along
with the fact it is error priority as I mentioned. It is a very
desirable place to put summarized
information.
> Also I do not think this patch improves things much for two reasons
> at leasts a) it doesn't really give you the whole list of killed tasks
> (this might be the whole memcg) and b) we already do have most important
> information in __oom_kill_process. If something is missing there I do
> not see a strong reason we cannot add it there. Like in this case.
>
This is a good point.
Additionally (which you know, but mentioning for reference) the OOM
output used to look like this:
Nov 14 15:23:48 oldserver kernel: [337631.991218] Out of memory: Kill
process 19961 (python) score 17 or sacrifice child
Nov 14 15:23:48 oldserver kernel: [337631.991237] Killed process 31357
(sh) total-vm:5400kB, anon-rss:252kB, file-rss:4kB, shmem-rss:0kB
It now looks like this with 5.3.0-rc5 (minus the oom_score_adj):
Jul 22 10:42:40 newserver kernel:
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-10383.slice/user@10383.service,task=oomprocs,pid=3035,uid=10383
Jul 22 10:42:40 newserver kernel: Out of memory: Killed process 3035
(oomprocs) total-vm:1056800kB, anon-rss:8kB, file-rss:4kB,
shmem-rss:0kB
Jul 22 10:42:40 newserver kernel: oom_reaper: reaped process 3035
(oomprocs), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
The old output did explain that a oom_score of 17 must have either
tied for highest or was the highest.
This did document why OOM selected the process it did, even if ends up
killing the related sh process.
With the newer format that added constraint message, it does provide
uid which can be helpful and
the oom_reaper showing that the memory was reclaimed is certainly reassuring.
My understanding now is that printing the oom_score is discouraged.
This seems unfortunate. The oom_score_adj can be adjusted
appropriately if oom_score is known.
So It would be useful to have both.
But at least if oom_score_adj is printed you can confirm the value at
the time of the OOM event.
Thank-you,
-Edward Chron
Arista Networks
> > Edward, how about this?
> >
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -420,11 +420,17 @@ static int dump_task(struct task_struct *p, void *arg)
> > * State information includes task's pid, uid, tgid, vm size, rss,
> > * pgtables_bytes, swapents, oom_score_adj value, and name.
> > */
> > -static void dump_tasks(struct oom_control *oc)
> > +static void dump_tasks(struct oom_control *oc, struct task_struct *victim)
> > {
> > pr_info("Tasks state (memory values in pages):\n");
> > pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n");
> >
> > + /* If vm.oom_dump_tasks is disabled, only show the victim */
> > + if (!sysctl_oom_dump_tasks) {
> > + dump_task(victim, oc);
> > + return;
> > + }
> > +
> > if (is_memcg_oom(oc))
> > mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> > else {
> > @@ -465,8 +471,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> > if (is_dump_unreclaim_slabs())
> > dump_unreclaimable_slab();
> > }
> > - if (sysctl_oom_dump_tasks)
> > - dump_tasks(oc);
> > + if (p || sysctl_oom_dump_tasks)
> > + dump_tasks(oc, p);
> > if (p)
> > dump_oom_summary(oc, p);
> > }
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
@ 2019-08-21 23:12 ` Edward Chron
0 siblings, 0 replies; 26+ messages in thread
From: Edward Chron @ 2019-08-21 23:12 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, Andrew Morton, Roman Gushchin, Johannes Weiner,
Tetsuo Handa, Shakeel Butt, linux-mm, linux-kernel,
Ivan Delalande
On Wed, Aug 21, 2019 at 12:47 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 21-08-19 00:19:37, David Rientjes wrote:
> > On Wed, 21 Aug 2019, Michal Hocko wrote:
> >
> > > > vm.oom_dump_tasks is pretty useful, however, so it's curious why you
> > > > haven't left it enabled :/
> > >
> > > Because it generates a lot of output potentially. Think of a workload
> > > with too many tasks which is not uncommon.
> >
> > Probably better to always print all the info for the victim so we don't
> > need to duplicate everything between dump_tasks() and dump_oom_summary().
>
> I believe that the motivation was to have a one line summary that is already
> parsed by log consumers. And that is in __oom_kill_process one.
>
Yes the motivation was one line summary that the OOM Killed Process
message supplies along
with the fact it is error priority as I mentioned. It is a very
desirable place to put summarized
information.
> Also I do not think this patch improves things much for two reasons
> at leasts a) it doesn't really give you the whole list of killed tasks
> (this might be the whole memcg) and b) we already do have most important
> information in __oom_kill_process. If something is missing there I do
> not see a strong reason we cannot add it there. Like in this case.
>
This is a good point.
Additionally (which you know, but mentioning for reference) the OOM
output used to look like this:
Nov 14 15:23:48 oldserver kernel: [337631.991218] Out of memory: Kill
process 19961 (python) score 17 or sacrifice child
Nov 14 15:23:48 oldserver kernel: [337631.991237] Killed process 31357
(sh) total-vm:5400kB, anon-rss:252kB, file-rss:4kB, shmem-rss:0kB
It now looks like this with 5.3.0-rc5 (minus the oom_score_adj):
Jul 22 10:42:40 newserver kernel:
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-10383.slice/user@10383.service,task=oomprocs,pid=3035,uid=10383
Jul 22 10:42:40 newserver kernel: Out of memory: Killed process 3035
(oomprocs) total-vm:1056800kB, anon-rss:8kB, file-rss:4kB,
shmem-rss:0kB
Jul 22 10:42:40 newserver kernel: oom_reaper: reaped process 3035
(oomprocs), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
The old output did explain that a oom_score of 17 must have either
tied for highest or was the highest.
This did document why OOM selected the process it did, even if ends up
killing the related sh process.
With the newer format that added constraint message, it does provide
uid which can be helpful and
the oom_reaper showing that the memory was reclaimed is certainly reassuring.
My understanding now is that printing the oom_score is discouraged.
This seems unfortunate. The oom_score_adj can be adjusted
appropriately if oom_score is known.
So It would be useful to have both.
But at least if oom_score_adj is printed you can confirm the value at
the time of the OOM event.
Thank-you,
-Edward Chron
Arista Networks
> > Edward, how about this?
> >
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -420,11 +420,17 @@ static int dump_task(struct task_struct *p, void *arg)
> > * State information includes task's pid, uid, tgid, vm size, rss,
> > * pgtables_bytes, swapents, oom_score_adj value, and name.
> > */
> > -static void dump_tasks(struct oom_control *oc)
> > +static void dump_tasks(struct oom_control *oc, struct task_struct *victim)
> > {
> > pr_info("Tasks state (memory values in pages):\n");
> > pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n");
> >
> > + /* If vm.oom_dump_tasks is disabled, only show the victim */
> > + if (!sysctl_oom_dump_tasks) {
> > + dump_task(victim, oc);
> > + return;
> > + }
> > +
> > if (is_memcg_oom(oc))
> > mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> > else {
> > @@ -465,8 +471,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> > if (is_dump_unreclaim_slabs())
> > dump_unreclaimable_slab();
> > }
> > - if (sysctl_oom_dump_tasks)
> > - dump_tasks(oc);
> > + if (p || sysctl_oom_dump_tasks)
> > + dump_tasks(oc, p);
> > if (p)
> > dump_oom_summary(oc, p);
> > }
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
2019-08-21 23:12 ` Edward Chron
(?)
@ 2019-08-22 7:21 ` Michal Hocko
2019-08-22 14:55 ` Edward Chron
-1 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2019-08-22 7:21 UTC (permalink / raw)
To: Edward Chron
Cc: David Rientjes, Andrew Morton, Roman Gushchin, Johannes Weiner,
Tetsuo Handa, Shakeel Butt, linux-mm, linux-kernel,
Ivan Delalande
On Wed 21-08-19 16:12:08, Edward Chron wrote:
[...]
> Additionally (which you know, but mentioning for reference) the OOM
> output used to look like this:
>
> Nov 14 15:23:48 oldserver kernel: [337631.991218] Out of memory: Kill
> process 19961 (python) score 17 or sacrifice child
> Nov 14 15:23:48 oldserver kernel: [337631.991237] Killed process 31357
> (sh) total-vm:5400kB, anon-rss:252kB, file-rss:4kB, shmem-rss:0kB
>
> It now looks like this with 5.3.0-rc5 (minus the oom_score_adj):
>
> Jul 22 10:42:40 newserver kernel:
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-10383.slice/user@10383.service,task=oomprocs,pid=3035,uid=10383
> Jul 22 10:42:40 newserver kernel: Out of memory: Killed process 3035
> (oomprocs) total-vm:1056800kB, anon-rss:8kB, file-rss:4kB,
> shmem-rss:0kB
> Jul 22 10:42:40 newserver kernel: oom_reaper: reaped process 3035
> (oomprocs), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>
> The old output did explain that a oom_score of 17 must have either
> tied for highest or was the highest.
> This did document why OOM selected the process it did, even if ends up
> killing the related sh process.
>
> With the newer format that added constraint message, it does provide
> uid which can be helpful and
> the oom_reaper showing that the memory was reclaimed is certainly reassuring.
>
> My understanding now is that printing the oom_score is discouraged.
> This seems unfortunate. The oom_score_adj can be adjusted
> appropriately if oom_score is known.
> So It would be useful to have both.
As already mentioned in our previous discussion I am really not happy
about exporting oom_score withtout a larger context - aka other tasks
scores to have something to compare against. Other than that the value
is an internal implementation detail and it is meaningless without
knowing the exact algorithm which can change at any times so no
userspace should really depend on it. All important metrics should be
displayed by the oom report message already.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
2019-08-22 7:21 ` Michal Hocko
@ 2019-08-22 14:55 ` Edward Chron
0 siblings, 0 replies; 26+ messages in thread
From: Edward Chron @ 2019-08-22 14:55 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, Andrew Morton, Roman Gushchin, Johannes Weiner,
Tetsuo Handa, Shakeel Butt, linux-mm, linux-kernel,
Ivan Delalande
On Thu, Aug 22, 2019 at 12:21 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 21-08-19 16:12:08, Edward Chron wrote:
> [...]
> > Additionally (which you know, but mentioning for reference) the OOM
> > output used to look like this:
> >
> > Nov 14 15:23:48 oldserver kernel: [337631.991218] Out of memory: Kill
> > process 19961 (python) score 17 or sacrifice child
> > Nov 14 15:23:48 oldserver kernel: [337631.991237] Killed process 31357
> > (sh) total-vm:5400kB, anon-rss:252kB, file-rss:4kB, shmem-rss:0kB
> >
> > It now looks like this with 5.3.0-rc5 (minus the oom_score_adj):
> >
> > Jul 22 10:42:40 newserver kernel:
> > oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-10383.slice/user@10383.service,task=oomprocs,pid=3035,uid=10383
> > Jul 22 10:42:40 newserver kernel: Out of memory: Killed process 3035
> > (oomprocs) total-vm:1056800kB, anon-rss:8kB, file-rss:4kB,
> > shmem-rss:0kB
> > Jul 22 10:42:40 newserver kernel: oom_reaper: reaped process 3035
> > (oomprocs), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> >
> > The old output did explain that a oom_score of 17 must have either
> > tied for highest or was the highest.
> > This did document why OOM selected the process it did, even if ends up
> > killing the related sh process.
> >
> > With the newer format that added constraint message, it does provide
> > uid which can be helpful and
> > the oom_reaper showing that the memory was reclaimed is certainly reassuring.
> >
> > My understanding now is that printing the oom_score is discouraged.
> > This seems unfortunate. The oom_score_adj can be adjusted
> > appropriately if oom_score is known.
> > So It would be useful to have both.
>
> As already mentioned in our previous discussion I am really not happy
> about exporting oom_score withtout a larger context - aka other tasks
> scores to have something to compare against. Other than that the value
> is an internal implementation detail and it is meaningless without
> knowing the exact algorithm which can change at any times so no
> userspace should really depend on it. All important metrics should be
> displayed by the oom report message already.
The oom_score is no longer displayed any where in the OOM output with 5.3
so there isn't anything to compare against any way with the current OOM
per process output and for the killed process.
I understand the reasoning for this from your discussion.
Thanks for explaining the rational.
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
@ 2019-08-22 14:55 ` Edward Chron
0 siblings, 0 replies; 26+ messages in thread
From: Edward Chron @ 2019-08-22 14:55 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, Andrew Morton, Roman Gushchin, Johannes Weiner,
Tetsuo Handa, Shakeel Butt, linux-mm, linux-kernel,
Ivan Delalande
On Thu, Aug 22, 2019 at 12:21 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 21-08-19 16:12:08, Edward Chron wrote:
> [...]
> > Additionally (which you know, but mentioning for reference) the OOM
> > output used to look like this:
> >
> > Nov 14 15:23:48 oldserver kernel: [337631.991218] Out of memory: Kill
> > process 19961 (python) score 17 or sacrifice child
> > Nov 14 15:23:48 oldserver kernel: [337631.991237] Killed process 31357
> > (sh) total-vm:5400kB, anon-rss:252kB, file-rss:4kB, shmem-rss:0kB
> >
> > It now looks like this with 5.3.0-rc5 (minus the oom_score_adj):
> >
> > Jul 22 10:42:40 newserver kernel:
> > oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-10383.slice/user@10383.service,task=oomprocs,pid=3035,uid=10383
> > Jul 22 10:42:40 newserver kernel: Out of memory: Killed process 3035
> > (oomprocs) total-vm:1056800kB, anon-rss:8kB, file-rss:4kB,
> > shmem-rss:0kB
> > Jul 22 10:42:40 newserver kernel: oom_reaper: reaped process 3035
> > (oomprocs), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> >
> > The old output did explain that a oom_score of 17 must have either
> > tied for highest or was the highest.
> > This did document why OOM selected the process it did, even if ends up
> > killing the related sh process.
> >
> > With the newer format that added constraint message, it does provide
> > uid which can be helpful and
> > the oom_reaper showing that the memory was reclaimed is certainly reassuring.
> >
> > My understanding now is that printing the oom_score is discouraged.
> > This seems unfortunate. The oom_score_adj can be adjusted
> > appropriately if oom_score is known.
> > So It would be useful to have both.
>
> As already mentioned in our previous discussion I am really not happy
> about exporting oom_score withtout a larger context - aka other tasks
> scores to have something to compare against. Other than that the value
> is an internal implementation detail and it is meaningless without
> knowing the exact algorithm which can change at any times so no
> userspace should really depend on it. All important metrics should be
> displayed by the oom report message already.
The oom_score is no longer displayed any where in the OOM output with 5.3
so there isn't anything to compare against any way with the current OOM
per process output and for the killed process.
I understand the reasoning for this from your discussion.
Thanks for explaining the rational.
>
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
2019-08-21 7:19 ` David Rientjes
@ 2019-08-21 22:22 ` Edward Chron
-1 siblings, 0 replies; 26+ messages in thread
From: Edward Chron @ 2019-08-21 22:22 UTC (permalink / raw)
To: David Rientjes
Cc: Michal Hocko, Andrew Morton, Roman Gushchin, Johannes Weiner,
Tetsuo Handa, Shakeel Butt, linux-mm, linux-kernel,
Ivan Delalande
On Wed, Aug 21, 2019 at 12:19 AM David Rientjes <rientjes@google.com> wrote:
>
> On Wed, 21 Aug 2019, Michal Hocko wrote:
>
> > > vm.oom_dump_tasks is pretty useful, however, so it's curious why you
> > > haven't left it enabled :/
> >
> > Because it generates a lot of output potentially. Think of a workload
> > with too many tasks which is not uncommon.
>
> Probably better to always print all the info for the victim so we don't
> need to duplicate everything between dump_tasks() and dump_oom_summary().
>
> Edward, how about this?
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -420,11 +420,17 @@ static int dump_task(struct task_struct *p, void *arg)
> * State information includes task's pid, uid, tgid, vm size, rss,
> * pgtables_bytes, swapents, oom_score_adj value, and name.
> */
> -static void dump_tasks(struct oom_control *oc)
> +static void dump_tasks(struct oom_control *oc, struct task_struct *victim)
> {
> pr_info("Tasks state (memory values in pages):\n");
> pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n");
>
> + /* If vm.oom_dump_tasks is disabled, only show the victim */
> + if (!sysctl_oom_dump_tasks) {
> + dump_task(victim, oc);
> + return;
> + }
> +
> if (is_memcg_oom(oc))
> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> else {
> @@ -465,8 +471,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> if (is_dump_unreclaim_slabs())
> dump_unreclaimable_slab();
> }
> - if (sysctl_oom_dump_tasks)
> - dump_tasks(oc);
> + if (p || sysctl_oom_dump_tasks)
> + dump_tasks(oc, p);
> if (p)
> dump_oom_summary(oc, p);
> }
I would be willing to accept this, though as Michal mentions in his
post, it would be very helpful to have the oom_score_adj on the Killed
process message.
One reason for that is that the Killed process message is the one
message that is printed with error priority (pr_err)
and so that message can be filtered out and sent to notify support
that an OOM event occurred.
Putting any information that can be shared in that message is useful
from my experience as it the initial point of triage for an OOM event.
Even if the full log with per user process is available it the
starting point for triage for an OOM event.
So from my perspective I would be happy having both, with David's
proposal providing a bit of extra information as shown here:
Jul 21 20:07:48 linuxserver kernel: [ pid ] uid tgid total_vm
rss pgtables_bytes swapents oom_score_adj name
Jul 21 20:07:48 linuxserver kernel: [ 547] 0 547 31664
615 299008 0 0
systemd-journal
The OOM Killed process message will print as:
Jul 21 20:07:48 linuxserver kernel: Out of memory: Killed process 2826
(oomprocs) total-vm:1056800kB, anon-rss:1052784kB, file-rss:4kB,
shmem-rss:0kB oom_score_adj:1000
But if only one one output change is allowed I'd favor the Killed
process message since that can be singled due to it's print priority
and forwarded.
By the way, right now there is redundancy in that the Killed process
message is printing vm, rss even if vm.oom_dump_tasks is enabled.
I don't see why that is a big deal.
It is very useful to have all the information that is there.
Wouldn't mind also having pgtables too but we would be able to get
that from the output of dump_task if that is enabled.
If it is acceptable to also add the dump_task for the killed process
for !sysctl_oom_dump_tasks I can repost the patch including that as
well.
Thank-you,
Edward Chron
Arista Networks
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
@ 2019-08-21 22:22 ` Edward Chron
0 siblings, 0 replies; 26+ messages in thread
From: Edward Chron @ 2019-08-21 22:22 UTC (permalink / raw)
To: David Rientjes
Cc: Michal Hocko, Andrew Morton, Roman Gushchin, Johannes Weiner,
Tetsuo Handa, Shakeel Butt, linux-mm, linux-kernel,
Ivan Delalande
On Wed, Aug 21, 2019 at 12:19 AM David Rientjes <rientjes@google.com> wrote:
>
> On Wed, 21 Aug 2019, Michal Hocko wrote:
>
> > > vm.oom_dump_tasks is pretty useful, however, so it's curious why you
> > > haven't left it enabled :/
> >
> > Because it generates a lot of output potentially. Think of a workload
> > with too many tasks which is not uncommon.
>
> Probably better to always print all the info for the victim so we don't
> need to duplicate everything between dump_tasks() and dump_oom_summary().
>
> Edward, how about this?
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -420,11 +420,17 @@ static int dump_task(struct task_struct *p, void *arg)
> * State information includes task's pid, uid, tgid, vm size, rss,
> * pgtables_bytes, swapents, oom_score_adj value, and name.
> */
> -static void dump_tasks(struct oom_control *oc)
> +static void dump_tasks(struct oom_control *oc, struct task_struct *victim)
> {
> pr_info("Tasks state (memory values in pages):\n");
> pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n");
>
> + /* If vm.oom_dump_tasks is disabled, only show the victim */
> + if (!sysctl_oom_dump_tasks) {
> + dump_task(victim, oc);
> + return;
> + }
> +
> if (is_memcg_oom(oc))
> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> else {
> @@ -465,8 +471,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> if (is_dump_unreclaim_slabs())
> dump_unreclaimable_slab();
> }
> - if (sysctl_oom_dump_tasks)
> - dump_tasks(oc);
> + if (p || sysctl_oom_dump_tasks)
> + dump_tasks(oc, p);
> if (p)
> dump_oom_summary(oc, p);
> }
I would be willing to accept this, though as Michal mentions in his
post, it would be very helpful to have the oom_score_adj on the Killed
process message.
One reason for that is that the Killed process message is the one
message that is printed with error priority (pr_err)
and so that message can be filtered out and sent to notify support
that an OOM event occurred.
Putting any information that can be shared in that message is useful
from my experience as it the initial point of triage for an OOM event.
Even if the full log with per user process is available it the
starting point for triage for an OOM event.
So from my perspective I would be happy having both, with David's
proposal providing a bit of extra information as shown here:
Jul 21 20:07:48 linuxserver kernel: [ pid ] uid tgid total_vm
rss pgtables_bytes swapents oom_score_adj name
Jul 21 20:07:48 linuxserver kernel: [ 547] 0 547 31664
615 299008 0 0
systemd-journal
The OOM Killed process message will print as:
Jul 21 20:07:48 linuxserver kernel: Out of memory: Killed process 2826
(oomprocs) total-vm:1056800kB, anon-rss:1052784kB, file-rss:4kB,
shmem-rss:0kB oom_score_adj:1000
But if only one one output change is allowed I'd favor the Killed
process message since that can be singled due to it's print priority
and forwarded.
By the way, right now there is redundancy in that the Killed process
message is printing vm, rss even if vm.oom_dump_tasks is enabled.
I don't see why that is a big deal.
It is very useful to have all the information that is there.
Wouldn't mind also having pgtables too but we would be able to get
that from the output of dump_task if that is enabled.
If it is acceptable to also add the dump_task for the killed process
for !sysctl_oom_dump_tasks I can repost the patch including that as
well.
Thank-you,
Edward Chron
Arista Networks
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
2019-08-21 22:22 ` Edward Chron
(?)
@ 2019-08-22 7:15 ` Michal Hocko
2019-08-22 14:47 ` Edward Chron
-1 siblings, 1 reply; 26+ messages in thread
From: Michal Hocko @ 2019-08-22 7:15 UTC (permalink / raw)
To: Edward Chron
Cc: David Rientjes, Andrew Morton, Roman Gushchin, Johannes Weiner,
Tetsuo Handa, Shakeel Butt, linux-mm, linux-kernel,
Ivan Delalande
On Wed 21-08-19 15:22:07, Edward Chron wrote:
> On Wed, Aug 21, 2019 at 12:19 AM David Rientjes <rientjes@google.com> wrote:
> >
> > On Wed, 21 Aug 2019, Michal Hocko wrote:
> >
> > > > vm.oom_dump_tasks is pretty useful, however, so it's curious why you
> > > > haven't left it enabled :/
> > >
> > > Because it generates a lot of output potentially. Think of a workload
> > > with too many tasks which is not uncommon.
> >
> > Probably better to always print all the info for the victim so we don't
> > need to duplicate everything between dump_tasks() and dump_oom_summary().
> >
> > Edward, how about this?
> >
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -420,11 +420,17 @@ static int dump_task(struct task_struct *p, void *arg)
> > * State information includes task's pid, uid, tgid, vm size, rss,
> > * pgtables_bytes, swapents, oom_score_adj value, and name.
> > */
> > -static void dump_tasks(struct oom_control *oc)
> > +static void dump_tasks(struct oom_control *oc, struct task_struct *victim)
> > {
> > pr_info("Tasks state (memory values in pages):\n");
> > pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n");
> >
> > + /* If vm.oom_dump_tasks is disabled, only show the victim */
> > + if (!sysctl_oom_dump_tasks) {
> > + dump_task(victim, oc);
> > + return;
> > + }
> > +
> > if (is_memcg_oom(oc))
> > mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> > else {
> > @@ -465,8 +471,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> > if (is_dump_unreclaim_slabs())
> > dump_unreclaimable_slab();
> > }
> > - if (sysctl_oom_dump_tasks)
> > - dump_tasks(oc);
> > + if (p || sysctl_oom_dump_tasks)
> > + dump_tasks(oc, p);
> > if (p)
> > dump_oom_summary(oc, p);
> > }
>
> I would be willing to accept this, though as Michal mentions in his
> post, it would be very helpful to have the oom_score_adj on the Killed
> process message.
>
> One reason for that is that the Killed process message is the one
> message that is printed with error priority (pr_err)
> and so that message can be filtered out and sent to notify support
> that an OOM event occurred.
> Putting any information that can be shared in that message is useful
> from my experience as it the initial point of triage for an OOM event.
> Even if the full log with per user process is available it the
> starting point for triage for an OOM event.
>
> So from my perspective I would be happy having both, with David's
> proposal providing a bit of extra information as shown here:
>
> Jul 21 20:07:48 linuxserver kernel: [ pid ] uid tgid total_vm
> rss pgtables_bytes swapents oom_score_adj name
> Jul 21 20:07:48 linuxserver kernel: [ 547] 0 547 31664
> 615 299008 0 0
> systemd-journal
>
> The OOM Killed process message will print as:
>
> Jul 21 20:07:48 linuxserver kernel: Out of memory: Killed process 2826
> (oomprocs) total-vm:1056800kB, anon-rss:1052784kB, file-rss:4kB,
> shmem-rss:0kB oom_score_adj:1000
>
> But if only one one output change is allowed I'd favor the Killed
> process message since that can be singled due to it's print priority
> and forwarded.
>
> By the way, right now there is redundancy in that the Killed process
> message is printing vm, rss even if vm.oom_dump_tasks is enabled.
> I don't see why that is a big deal.
There will always be redundancy there because dump_tasks part is there
mostly to check the oom victim decision for potential wrong/unexpected
selection. While "killed..." message is there to inform who has been
killed. Most people really do care about that part only.
> It is very useful to have all the information that is there.
> Wouldn't mind also having pgtables too but we would be able to get
> that from the output of dump_task if that is enabled.
I am not against adding pgrable information there. That memory is going
to be released when the task dies.
> If it is acceptable to also add the dump_task for the killed process
> for !sysctl_oom_dump_tasks I can repost the patch including that as
> well.
Well, I would rather focus on adding the missing pieces to the killed
task message instead.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
2019-08-22 7:15 ` Michal Hocko
@ 2019-08-22 14:47 ` Edward Chron
0 siblings, 0 replies; 26+ messages in thread
From: Edward Chron @ 2019-08-22 14:47 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, Andrew Morton, Roman Gushchin, Johannes Weiner,
Tetsuo Handa, Shakeel Butt, linux-mm, linux-kernel,
Ivan Delalande
On Thu, Aug 22, 2019 at 12:15 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 21-08-19 15:22:07, Edward Chron wrote:
> > On Wed, Aug 21, 2019 at 12:19 AM David Rientjes <rientjes@google.com> wrote:
> > >
> > > On Wed, 21 Aug 2019, Michal Hocko wrote:
> > >
> > > > > vm.oom_dump_tasks is pretty useful, however, so it's curious why you
> > > > > haven't left it enabled :/
> > > >
> > > > Because it generates a lot of output potentially. Think of a workload
> > > > with too many tasks which is not uncommon.
> > >
> > > Probably better to always print all the info for the victim so we don't
> > > need to duplicate everything between dump_tasks() and dump_oom_summary().
> > >
> > > Edward, how about this?
> > >
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -420,11 +420,17 @@ static int dump_task(struct task_struct *p, void *arg)
> > > * State information includes task's pid, uid, tgid, vm size, rss,
> > > * pgtables_bytes, swapents, oom_score_adj value, and name.
> > > */
> > > -static void dump_tasks(struct oom_control *oc)
> > > +static void dump_tasks(struct oom_control *oc, struct task_struct *victim)
> > > {
> > > pr_info("Tasks state (memory values in pages):\n");
> > > pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n");
> > >
> > > + /* If vm.oom_dump_tasks is disabled, only show the victim */
> > > + if (!sysctl_oom_dump_tasks) {
> > > + dump_task(victim, oc);
> > > + return;
> > > + }
> > > +
> > > if (is_memcg_oom(oc))
> > > mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> > > else {
> > > @@ -465,8 +471,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> > > if (is_dump_unreclaim_slabs())
> > > dump_unreclaimable_slab();
> > > }
> > > - if (sysctl_oom_dump_tasks)
> > > - dump_tasks(oc);
> > > + if (p || sysctl_oom_dump_tasks)
> > > + dump_tasks(oc, p);
> > > if (p)
> > > dump_oom_summary(oc, p);
> > > }
> >
> > I would be willing to accept this, though as Michal mentions in his
> > post, it would be very helpful to have the oom_score_adj on the Killed
> > process message.
> >
> > One reason for that is that the Killed process message is the one
> > message that is printed with error priority (pr_err)
> > and so that message can be filtered out and sent to notify support
> > that an OOM event occurred.
> > Putting any information that can be shared in that message is useful
> > from my experience as it the initial point of triage for an OOM event.
> > Even if the full log with per user process is available it the
> > starting point for triage for an OOM event.
> >
> > So from my perspective I would be happy having both, with David's
> > proposal providing a bit of extra information as shown here:
> >
> > Jul 21 20:07:48 linuxserver kernel: [ pid ] uid tgid total_vm
> > rss pgtables_bytes swapents oom_score_adj name
> > Jul 21 20:07:48 linuxserver kernel: [ 547] 0 547 31664
> > 615 299008 0 0
> > systemd-journal
> >
> > The OOM Killed process message will print as:
> >
> > Jul 21 20:07:48 linuxserver kernel: Out of memory: Killed process 2826
> > (oomprocs) total-vm:1056800kB, anon-rss:1052784kB, file-rss:4kB,
> > shmem-rss:0kB oom_score_adj:1000
> >
> > But if only one one output change is allowed I'd favor the Killed
> > process message since that can be singled due to it's print priority
> > and forwarded.
> >
> > By the way, right now there is redundancy in that the Killed process
> > message is printing vm, rss even if vm.oom_dump_tasks is enabled.
> > I don't see why that is a big deal.
>
> There will always be redundancy there because dump_tasks part is there
> mostly to check the oom victim decision for potential wrong/unexpected
> selection. While "killed..." message is there to inform who has been
> killed. Most people really do care about that part only.
>
> > It is very useful to have all the information that is there.
> > Wouldn't mind also having pgtables too but we would be able to get
> > that from the output of dump_task if that is enabled.
>
> I am not against adding pgrable information there. That memory is going
> to be released when the task dies.
Oh Thank-you, will include that in updated patch as it useful information.
>
> > If it is acceptable to also add the dump_task for the killed process
> > for !sysctl_oom_dump_tasks I can repost the patch including that as
> > well.
>
> Well, I would rather focus on adding the missing pieces to the killed
> task message instead.
>
Will do.
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
@ 2019-08-22 14:47 ` Edward Chron
0 siblings, 0 replies; 26+ messages in thread
From: Edward Chron @ 2019-08-22 14:47 UTC (permalink / raw)
To: Michal Hocko
Cc: David Rientjes, Andrew Morton, Roman Gushchin, Johannes Weiner,
Tetsuo Handa, Shakeel Butt, linux-mm, linux-kernel,
Ivan Delalande
On Thu, Aug 22, 2019 at 12:15 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 21-08-19 15:22:07, Edward Chron wrote:
> > On Wed, Aug 21, 2019 at 12:19 AM David Rientjes <rientjes@google.com> wrote:
> > >
> > > On Wed, 21 Aug 2019, Michal Hocko wrote:
> > >
> > > > > vm.oom_dump_tasks is pretty useful, however, so it's curious why you
> > > > > haven't left it enabled :/
> > > >
> > > > Because it generates a lot of output potentially. Think of a workload
> > > > with too many tasks which is not uncommon.
> > >
> > > Probably better to always print all the info for the victim so we don't
> > > need to duplicate everything between dump_tasks() and dump_oom_summary().
> > >
> > > Edward, how about this?
> > >
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -420,11 +420,17 @@ static int dump_task(struct task_struct *p, void *arg)
> > > * State information includes task's pid, uid, tgid, vm size, rss,
> > > * pgtables_bytes, swapents, oom_score_adj value, and name.
> > > */
> > > -static void dump_tasks(struct oom_control *oc)
> > > +static void dump_tasks(struct oom_control *oc, struct task_struct *victim)
> > > {
> > > pr_info("Tasks state (memory values in pages):\n");
> > > pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n");
> > >
> > > + /* If vm.oom_dump_tasks is disabled, only show the victim */
> > > + if (!sysctl_oom_dump_tasks) {
> > > + dump_task(victim, oc);
> > > + return;
> > > + }
> > > +
> > > if (is_memcg_oom(oc))
> > > mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> > > else {
> > > @@ -465,8 +471,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> > > if (is_dump_unreclaim_slabs())
> > > dump_unreclaimable_slab();
> > > }
> > > - if (sysctl_oom_dump_tasks)
> > > - dump_tasks(oc);
> > > + if (p || sysctl_oom_dump_tasks)
> > > + dump_tasks(oc, p);
> > > if (p)
> > > dump_oom_summary(oc, p);
> > > }
> >
> > I would be willing to accept this, though as Michal mentions in his
> > post, it would be very helpful to have the oom_score_adj on the Killed
> > process message.
> >
> > One reason for that is that the Killed process message is the one
> > message that is printed with error priority (pr_err)
> > and so that message can be filtered out and sent to notify support
> > that an OOM event occurred.
> > Putting any information that can be shared in that message is useful
> > from my experience as it the initial point of triage for an OOM event.
> > Even if the full log with per user process is available it the
> > starting point for triage for an OOM event.
> >
> > So from my perspective I would be happy having both, with David's
> > proposal providing a bit of extra information as shown here:
> >
> > Jul 21 20:07:48 linuxserver kernel: [ pid ] uid tgid total_vm
> > rss pgtables_bytes swapents oom_score_adj name
> > Jul 21 20:07:48 linuxserver kernel: [ 547] 0 547 31664
> > 615 299008 0 0
> > systemd-journal
> >
> > The OOM Killed process message will print as:
> >
> > Jul 21 20:07:48 linuxserver kernel: Out of memory: Killed process 2826
> > (oomprocs) total-vm:1056800kB, anon-rss:1052784kB, file-rss:4kB,
> > shmem-rss:0kB oom_score_adj:1000
> >
> > But if only one one output change is allowed I'd favor the Killed
> > process message since that can be singled due to it's print priority
> > and forwarded.
> >
> > By the way, right now there is redundancy in that the Killed process
> > message is printing vm, rss even if vm.oom_dump_tasks is enabled.
> > I don't see why that is a big deal.
>
> There will always be redundancy there because dump_tasks part is there
> mostly to check the oom victim decision for potential wrong/unexpected
> selection. While "killed..." message is there to inform who has been
> killed. Most people really do care about that part only.
>
> > It is very useful to have all the information that is there.
> > Wouldn't mind also having pgtables too but we would be able to get
> > that from the output of dump_task if that is enabled.
>
> I am not against adding pgrable information there. That memory is going
> to be released when the task dies.
Oh Thank-you, will include that in updated patch as it useful information.
>
> > If it is acceptable to also add the dump_task for the killed process
> > for !sysctl_oom_dump_tasks I can repost the patch including that as
> > well.
>
> Well, I would rather focus on adding the missing pieces to the killed
> task message instead.
>
Will do.
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
2019-08-21 7:19 ` David Rientjes
@ 2019-08-22 15:18 ` Edward Chron
-1 siblings, 0 replies; 26+ messages in thread
From: Edward Chron @ 2019-08-22 15:18 UTC (permalink / raw)
To: David Rientjes
Cc: Michal Hocko, Andrew Morton, Roman Gushchin, Johannes Weiner,
Tetsuo Handa, Shakeel Butt, linux-mm, linux-kernel,
Ivan Delalande
On Wed, Aug 21, 2019 at 12:19 AM David Rientjes <rientjes@google.com> wrote:
>
> On Wed, 21 Aug 2019, Michal Hocko wrote:
>
> > > vm.oom_dump_tasks is pretty useful, however, so it's curious why you
> > > haven't left it enabled :/
> >
> > Because it generates a lot of output potentially. Think of a workload
> > with too many tasks which is not uncommon.
>
> Probably better to always print all the info for the victim so we don't
> need to duplicate everything between dump_tasks() and dump_oom_summary().
>
> Edward, how about this?
It is worth mentioning that David's suggested change, while I agree with Michal
that it should be a separate issue from updating the OOM Killed process message,
certainly has merit. Though, it's not strictly necessary for what I
was asking for.
If you have scripts that scan your logs from OOM events, having a regular format
to OOM output makes parsing easier. With David's suggestion there would always
be a "Tasks state" section and the vm.oom_dump_tasks still works but
it just prevents
all the tasks from being dumped not from dumping the killed process.
OOM output was reorganized not that long ago as we discussed earlier to provide
improved organization of data, so this proposal would be in line with
that change.
If there is interest in this I can submit a separate patch submission.
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -420,11 +420,17 @@ static int dump_task(struct task_struct *p, void *arg)
> * State information includes task's pid, uid, tgid, vm size, rss,
> * pgtables_bytes, swapents, oom_score_adj value, and name.
> */
> -static void dump_tasks(struct oom_control *oc)
> +static void dump_tasks(struct oom_control *oc, struct task_struct *victim)
> {
> pr_info("Tasks state (memory values in pages):\n");
> pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n");
>
> + /* If vm.oom_dump_tasks is disabled, only show the victim */
> + if (!sysctl_oom_dump_tasks) {
> + dump_task(victim, oc);
> + return;
> + }
> +
> if (is_memcg_oom(oc))
> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> else {
> @@ -465,8 +471,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> if (is_dump_unreclaim_slabs())
> dump_unreclaimable_slab();
> }
> - if (sysctl_oom_dump_tasks)
> - dump_tasks(oc);
> + if (p || sysctl_oom_dump_tasks)
> + dump_tasks(oc, p);
> if (p)
> dump_oom_summary(oc, p);
> }
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] mm/oom: Add oom_score_adj value to oom Killed process message
@ 2019-08-22 15:18 ` Edward Chron
0 siblings, 0 replies; 26+ messages in thread
From: Edward Chron @ 2019-08-22 15:18 UTC (permalink / raw)
To: David Rientjes
Cc: Michal Hocko, Andrew Morton, Roman Gushchin, Johannes Weiner,
Tetsuo Handa, Shakeel Butt, linux-mm, linux-kernel,
Ivan Delalande
On Wed, Aug 21, 2019 at 12:19 AM David Rientjes <rientjes@google.com> wrote:
>
> On Wed, 21 Aug 2019, Michal Hocko wrote:
>
> > > vm.oom_dump_tasks is pretty useful, however, so it's curious why you
> > > haven't left it enabled :/
> >
> > Because it generates a lot of output potentially. Think of a workload
> > with too many tasks which is not uncommon.
>
> Probably better to always print all the info for the victim so we don't
> need to duplicate everything between dump_tasks() and dump_oom_summary().
>
> Edward, how about this?
It is worth mentioning that David's suggested change, while I agree with Michal
that it should be a separate issue from updating the OOM Killed process message,
certainly has merit. Though, it's not strictly necessary for what I
was asking for.
If you have scripts that scan your logs from OOM events, having a regular format
to OOM output makes parsing easier. With David's suggestion there would always
be a "Tasks state" section and the vm.oom_dump_tasks still works but
it just prevents
all the tasks from being dumped not from dumping the killed process.
OOM output was reorganized not that long ago as we discussed earlier to provide
improved organization of data, so this proposal would be in line with
that change.
If there is interest in this I can submit a separate patch submission.
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -420,11 +420,17 @@ static int dump_task(struct task_struct *p, void *arg)
> * State information includes task's pid, uid, tgid, vm size, rss,
> * pgtables_bytes, swapents, oom_score_adj value, and name.
> */
> -static void dump_tasks(struct oom_control *oc)
> +static void dump_tasks(struct oom_control *oc, struct task_struct *victim)
> {
> pr_info("Tasks state (memory values in pages):\n");
> pr_info("[ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name\n");
>
> + /* If vm.oom_dump_tasks is disabled, only show the victim */
> + if (!sysctl_oom_dump_tasks) {
> + dump_task(victim, oc);
> + return;
> + }
> +
> if (is_memcg_oom(oc))
> mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> else {
> @@ -465,8 +471,8 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
> if (is_dump_unreclaim_slabs())
> dump_unreclaimable_slab();
> }
> - if (sysctl_oom_dump_tasks)
> - dump_tasks(oc);
> + if (p || sysctl_oom_dump_tasks)
> + dump_tasks(oc, p);
> if (p)
> dump_oom_summary(oc, p);
> }
^ permalink raw reply [flat|nested] 26+ messages in thread