linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Add the memcg print oom info for system oom
@ 2018-05-17  7:00 ufo19890607
  2018-05-17  7:11 ` Michal Hocko
  2018-05-19 21:18 ` kbuild test robot
  0 siblings, 2 replies; 8+ messages in thread
From: ufo19890607 @ 2018-05-17  7:00 UTC (permalink / raw)
  To: akpm, mhocko, rientjes, kirill.shutemov, aarcange,
	penguin-kernel, guro, yang.s
  Cc: linux-mm, linux-kernel, yuzhoujian

From: yuzhoujian <yuzhoujian@didichuxing.com>

The dump_header does not print the memcg's name when the system
oom happened. Some users want to locate the certain container
which contains the task that has been killed by the oom killer.
So I add the mem_cgroup_print_oom_info when system oom events
happened.

Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com>
---
 mm/oom_kill.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8ba6cb88cf58..244416c9834a 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -433,6 +433,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
 	if (is_memcg_oom(oc))
 		mem_cgroup_print_oom_info(oc->memcg, p);
 	else {
+		mem_cgroup_print_oom_info(mem_cgroup_from_task(p), p);
 		show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
 		if (is_dump_unreclaim_slabs())
 			dump_unreclaimable_slab();
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add the memcg print oom info for system oom
  2018-05-17  7:00 [PATCH] Add the memcg print oom info for system oom ufo19890607
@ 2018-05-17  7:11 ` Michal Hocko
       [not found]   ` <CAHCio2gOLnj4NpkFrxpYVygg6ZeSeuwgp2Lwr6oTHRxHpbmcWw@mail.gmail.com>
  2018-05-19 21:18 ` kbuild test robot
  1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2018-05-17  7:11 UTC (permalink / raw)
  To: ufo19890607
  Cc: akpm, rientjes, kirill.shutemov, aarcange, penguin-kernel, guro,
	yang.s, linux-mm, linux-kernel, yuzhoujian

On Thu 17-05-18 08:00:28, ufo19890607 wrote:
> From: yuzhoujian <yuzhoujian@didichuxing.com>
> 
> The dump_header does not print the memcg's name when the system
> oom happened. Some users want to locate the certain container
> which contains the task that has been killed by the oom killer.
> So I add the mem_cgroup_print_oom_info when system oom events
> happened.

The oom report is quite heavy today. Do we really need the full memcg
oom report here. Wouldn't it be sufficient to print the memcg the task
belongs to?

> Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com>
> ---
>  mm/oom_kill.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 8ba6cb88cf58..244416c9834a 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -433,6 +433,7 @@ static void dump_header(struct oom_control *oc, struct task_struct *p)
>  	if (is_memcg_oom(oc))
>  		mem_cgroup_print_oom_info(oc->memcg, p);
>  	else {
> +		mem_cgroup_print_oom_info(mem_cgroup_from_task(p), p);
>  		show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
>  		if (is_dump_unreclaim_slabs())
>  			dump_unreclaimable_slab();
> -- 
> 2.14.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add the memcg print oom info for system oom
       [not found]   ` <CAHCio2gOLnj4NpkFrxpYVygg6ZeSeuwgp2Lwr6oTHRxHpbmcWw@mail.gmail.com>
@ 2018-05-17 10:23     ` Michal Hocko
  2018-05-17 10:42       ` Roman Gushchin
  2018-05-21 21:11       ` David Rientjes
  0 siblings, 2 replies; 8+ messages in thread
From: Michal Hocko @ 2018-05-17 10:23 UTC (permalink / raw)
  To: 禹舟键
  Cc: akpm, rientjes, kirill.shutemov, aarcange, penguin-kernel, guro,
	yang.s, linux-mm, linux-kernel, Wind Yu

On Thu 17-05-18 17:44:43, 禹舟键 wrote:
> Hi Michal
> I think the current OOM report is imcomplete. I can get the task which
> invoked the oom-killer and the task which has been killed by the
> oom-killer, and memory info when the oom happened. But I cannot infer the
> certain memcg to which the task killed by oom-killer belongs, because that
> task has been killed, and the dump_task will print all of the tasks in the
> system.

I can see how the origin memcg might be useful, but ...
> 
> mem_cgroup_print_oom_info will print five lines of content including
> memcg's name , usage, limit. I don't think five lines of content will cause
> a big problem. Or it at least prints the memcg's name.

this is not 5 lines at all. We dump memcg stats for the whole oom memcg
subtree. For your patch it would be the whole subtree of the memcg of
the oom victim. With cgroup v1 this can be quite deep as tasks can
belong to inter-nodes as well. Would be

		pr_info("Task in ");
		pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
		pr_cont(" killed as a result of limit of ");

part of that output sufficient for your usecase? You will not get memory
consumption of the group but is that really so relevant when we are
killing individual tasks? Please note that there are proposals to make
the global oom killer memcg aware and select by the memcg size rather
than pick on random tasks
(http://lkml.kernel.org/r/20171130152824.1591-1-guro@fb.com). Maybe that
will be more interesting for your container usecase.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add the memcg print oom info for system oom
  2018-05-17 10:23     ` Michal Hocko
@ 2018-05-17 10:42       ` Roman Gushchin
  2018-05-21 21:11       ` David Rientjes
  1 sibling, 0 replies; 8+ messages in thread
From: Roman Gushchin @ 2018-05-17 10:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: 禹舟键,
	akpm, rientjes, kirill.shutemov, aarcange, penguin-kernel,
	yang.s, linux-mm, linux-kernel, Wind Yu

On Thu, May 17, 2018 at 12:23:30PM +0200, Michal Hocko wrote:
> On Thu 17-05-18 17:44:43, 禹舟键 wrote:
> > Hi Michal
> > I think the current OOM report is imcomplete. I can get the task which
> > invoked the oom-killer and the task which has been killed by the
> > oom-killer, and memory info when the oom happened. But I cannot infer the
> > certain memcg to which the task killed by oom-killer belongs, because that
> > task has been killed, and the dump_task will print all of the tasks in the
> > system.
> 
> I can see how the origin memcg might be useful, but ...
> > 
> > mem_cgroup_print_oom_info will print five lines of content including
> > memcg's name , usage, limit. I don't think five lines of content will cause
> > a big problem. Or it at least prints the memcg's name.

I want only add here that if system-wide OOM is a rare event, you can look
at per-cgroup oom counters to find the cgroup, which contained the killed
task. Not super handy, but might work for debug purposes.

> this is not 5 lines at all. We dump memcg stats for the whole oom memcg
> subtree. For your patch it would be the whole subtree of the memcg of
> the oom victim. With cgroup v1 this can be quite deep as tasks can
> belong to inter-nodes as well. Would be
> 
> 		pr_info("Task in ");
> 		pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
> 		pr_cont(" killed as a result of limit of ");
> 
> part of that output sufficient for your usecase? You will not get memory
> consumption of the group but is that really so relevant when we are
> killing individual tasks? Please note that there are proposals to make
> the global oom killer memcg aware and select by the memcg size rather
> than pick on random tasks
> (http://lkml.kernel.org/r/20171130152824.1591-1-guro@fb.com). Maybe that
> will be more interesting for your container usecase.

Speaking about memcg OOM reports more broadly, IMO
rather than spam with memcg-local OOM dumps to dmesg,
it's better to add a new interface to read memcg-specific OOM reports.

The current dmesg OOM report contains a lot of low-level stuff,
which is handy for debugging system-wide OOM issues,
and memcg-aware stuff too; that makes it bulky.

Anyway, Michal's 1-line proposal looks quite acceptable to me.

Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add the memcg print oom info for system oom
  2018-05-17  7:00 [PATCH] Add the memcg print oom info for system oom ufo19890607
  2018-05-17  7:11 ` Michal Hocko
@ 2018-05-19 21:18 ` kbuild test robot
  1 sibling, 0 replies; 8+ messages in thread
From: kbuild test robot @ 2018-05-19 21:18 UTC (permalink / raw)
  To: ufo19890607
  Cc: kbuild-all, akpm, mhocko, rientjes, kirill.shutemov, aarcange,
	penguin-kernel, guro, yang.s, linux-mm, linux-kernel, yuzhoujian

[-- Attachment #1: Type: text/plain, Size: 2664 bytes --]

Hi yuzhoujian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.17-rc5 next-20180517]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/ufo19890607/Add-the-memcg-print-oom-info-for-system-oom/20180520-041924
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: x86_64-randconfig-x003-201820 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   mm/oom_kill.c: In function 'dump_header':
>> mm/oom_kill.c:436:29: error: implicit declaration of function 'mem_cgroup_from_task'; did you mean 'mem_cgroup_from_id'? [-Werror=implicit-function-declaration]
      mem_cgroup_print_oom_info(mem_cgroup_from_task(p), p);
                                ^~~~~~~~~~~~~~~~~~~~
                                mem_cgroup_from_id
>> mm/oom_kill.c:436:29: warning: passing argument 1 of 'mem_cgroup_print_oom_info' makes pointer from integer without a cast [-Wint-conversion]
   In file included from include/linux/swap.h:9:0,
                    from mm/oom_kill.c:28:
   include/linux/memcontrol.h:932:1: note: expected 'struct mem_cgroup *' but argument is of type 'int'
    mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
    ^~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +436 mm/oom_kill.c

   421	
   422	static void dump_header(struct oom_control *oc, struct task_struct *p)
   423	{
   424		pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), nodemask=%*pbl, order=%d, oom_score_adj=%hd\n",
   425			current->comm, oc->gfp_mask, &oc->gfp_mask,
   426			nodemask_pr_args(oc->nodemask), oc->order,
   427				current->signal->oom_score_adj);
   428		if (!IS_ENABLED(CONFIG_COMPACTION) && oc->order)
   429			pr_warn("COMPACTION is disabled!!!\n");
   430	
   431		cpuset_print_current_mems_allowed();
   432		dump_stack();
   433		if (is_memcg_oom(oc))
   434			mem_cgroup_print_oom_info(oc->memcg, p);
   435		else {
 > 436			mem_cgroup_print_oom_info(mem_cgroup_from_task(p), p);
   437			show_mem(SHOW_MEM_FILTER_NODES, oc->nodemask);
   438			if (is_dump_unreclaim_slabs())
   439				dump_unreclaimable_slab();
   440		}
   441		if (sysctl_oom_dump_tasks)
   442			dump_tasks(oc->memcg, oc->nodemask);
   443	}
   444	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27596 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add the memcg print oom info for system oom
  2018-05-17 10:23     ` Michal Hocko
  2018-05-17 10:42       ` Roman Gushchin
@ 2018-05-21 21:11       ` David Rientjes
  2018-05-22  6:37         ` Michal Hocko
  1 sibling, 1 reply; 8+ messages in thread
From: David Rientjes @ 2018-05-21 21:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: 禹舟键,
	akpm, kirill.shutemov, aarcange, penguin-kernel, guro, yang.s,
	linux-mm, linux-kernel, Wind Yu

On Thu, 17 May 2018, Michal Hocko wrote:

> this is not 5 lines at all. We dump memcg stats for the whole oom memcg
> subtree. For your patch it would be the whole subtree of the memcg of
> the oom victim. With cgroup v1 this can be quite deep as tasks can
> belong to inter-nodes as well. Would be
> 
> 		pr_info("Task in ");
> 		pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
> 		pr_cont(" killed as a result of limit of ");
> 
> part of that output sufficient for your usecase?

There's no memcg to print as the limit in the above, but it does seem like 
the single line output is all that is needed in this case.

It might be useful to discuss a single line output that specifies relevant 
information about the context of the oom kill, the killed thread, and the 
memcg of that thread, in a way that will be backwards compatible.  The 
messages in the oom killer have been restructured over time, I don't 
believe there is a backwards compatible way to search for an oom event in 
the kernel log.

I've had success with defining a single line output the includes the 
CONSTRAINT_* of the oom kill, the origin and kill memcgs, the thread name, 
pid, and uid.  On system oom kills, origin and kill memcgs are left empty.

oom-kill constraint=CONSTRAINT_* origin_memcg=<memcg> kill_memcg=<memcg> task=<comm> pid=<pid> uid=<uid>

Perhaps we should introduce a single line output that will be backwards 
compatible that includes this information?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add the memcg print oom info for system oom
  2018-05-21 21:11       ` David Rientjes
@ 2018-05-22  6:37         ` Michal Hocko
  2018-05-22 22:54           ` David Rientjes
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2018-05-22  6:37 UTC (permalink / raw)
  To: David Rientjes
  Cc: 禹舟键,
	akpm, kirill.shutemov, aarcange, penguin-kernel, guro, yang.s,
	linux-mm, linux-kernel, Wind Yu

On Mon 21-05-18 14:11:21, David Rientjes wrote:
> On Thu, 17 May 2018, Michal Hocko wrote:
> 
> > this is not 5 lines at all. We dump memcg stats for the whole oom memcg
> > subtree. For your patch it would be the whole subtree of the memcg of
> > the oom victim. With cgroup v1 this can be quite deep as tasks can
> > belong to inter-nodes as well. Would be
> > 
> > 		pr_info("Task in ");
> > 		pr_cont_cgroup_path(task_cgroup(p, memory_cgrp_id));
> > 		pr_cont(" killed as a result of limit of ");
> > 
> > part of that output sufficient for your usecase?
> 
> There's no memcg to print as the limit in the above, but it does seem like 
> the single line output is all that is needed in this case.

Yeah, that is exactly what I was proposing. I just copy&pasted the whole
part to make it clear which part of mem_cgroup_print_oom_info I meant.
Referring to "killed as a reslt of limit of" was misleading. Sorry about
that.

> It might be useful to discuss a single line output that specifies relevant 
> information about the context of the oom kill, the killed thread, and the 
> memcg of that thread, in a way that will be backwards compatible.  The 
> messages in the oom killer have been restructured over time, I don't 
> believe there is a backwards compatible way to search for an oom event in 
> the kernel log.

Agreed
 
> I've had success with defining a single line output the includes the 
> CONSTRAINT_* of the oom kill, the origin and kill memcgs, the thread name, 
> pid, and uid.  On system oom kills, origin and kill memcgs are left empty.
> 
> oom-kill constraint=CONSTRAINT_* origin_memcg=<memcg> kill_memcg=<memcg> task=<comm> pid=<pid> uid=<uid>
> 
> Perhaps we should introduce a single line output that will be backwards 
> compatible that includes this information?

I do not have a strong preference here. We already print cpuset on its
own line and we can do the same for the memcg.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Add the memcg print oom info for system oom
  2018-05-22  6:37         ` Michal Hocko
@ 2018-05-22 22:54           ` David Rientjes
  0 siblings, 0 replies; 8+ messages in thread
From: David Rientjes @ 2018-05-22 22:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: 禹舟键,
	akpm, kirill.shutemov, aarcange, penguin-kernel, guro, yang.s,
	linux-mm, linux-kernel, Wind Yu

On Tue, 22 May 2018, Michal Hocko wrote:

> > I've had success with defining a single line output the includes the 
> > CONSTRAINT_* of the oom kill, the origin and kill memcgs, the thread name, 
> > pid, and uid.  On system oom kills, origin and kill memcgs are left empty.
> > 
> > oom-kill constraint=CONSTRAINT_* origin_memcg=<memcg> kill_memcg=<memcg> task=<comm> pid=<pid> uid=<uid>
> > 
> > Perhaps we should introduce a single line output that will be backwards 
> > compatible that includes this information?
> 
> I do not have a strong preference here. We already print cpuset on its
> own line and we can do the same for the memcg.
> 

Yes, for both the memcg that has reached its limit (origin_memcg) and the 
memcg the killed process is attached to (kill_memcg).

It's beneficial to have a single-line output to avoid any printk 
interleaving or ratelimiting that includes the constraint, comm, and at 
least pid.  (We include uid simply to find oom kills of root processes.)

We already have all this information, including cpuset, cpuset nodemask, 
and allocation nodemask for mempolicy ooms.  The only exception appears to 
be the kill_memcg for CONSTRAINT_NONE and for it to be emitted in a way 
that can't be interleaved or suppressed.

Perhaps we can have this?

oom-kill constraint=CONSTRAINT_* nodemask=<cpuset/mempolicy nodemask> origin_memcg=<memcg> kill_memcg=<memcg> task=<comm> pid=<pid> uid=<uid>

For CONSTRAINT_NONE, nodemask and origin_memcg are empty.  For 
CONSTRAINT_CPUSET and CONSTRAINT_MEMORY_POLICY, origin_memcg is empty.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-05-22 22:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-17  7:00 [PATCH] Add the memcg print oom info for system oom ufo19890607
2018-05-17  7:11 ` Michal Hocko
     [not found]   ` <CAHCio2gOLnj4NpkFrxpYVygg6ZeSeuwgp2Lwr6oTHRxHpbmcWw@mail.gmail.com>
2018-05-17 10:23     ` Michal Hocko
2018-05-17 10:42       ` Roman Gushchin
2018-05-21 21:11       ` David Rientjes
2018-05-22  6:37         ` Michal Hocko
2018-05-22 22:54           ` David Rientjes
2018-05-19 21:18 ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).