From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753133AbbJZViQ (ORCPT ); Mon, 26 Oct 2015 17:38:16 -0400 Received: from mail-pa0-f44.google.com ([209.85.220.44]:36799 "EHLO mail-pa0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752495AbbJZViO (ORCPT ); Mon, 26 Oct 2015 17:38:14 -0400 Date: Mon, 26 Oct 2015 14:38:11 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Aristeu Rozanski cc: linux-kernel@vger.kernel.org, Greg Thelen , Johannes Weiner , linux-mm@kvack.org, cgroups@vger.kernel.org Subject: Re: [PATCH] oom_kill: add option to disable dump_stack() In-Reply-To: <1445634150-27992-1-git-send-email-arozansk@redhat.com> Message-ID: References: <1445634150-27992-1-git-send-email-arozansk@redhat.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 23 Oct 2015, Aristeu Rozanski wrote: > One of the largest chunks of log messages in a OOM is from dump_stack() and in > some cases it isn't even necessary to figure out what's going on. In > systems with multiple tenants/containers with limited resources each > OOMs can be way more frequent and being able to reduce the amount of log > output for each situation is useful. > > This patch adds a sysctl to allow disabling dump_stack() during an OOM while > keeping the default to behave the same way it behaves today. > > Cc: Greg Thelen > Cc: Johannes Weiner > Cc: linux-mm@kvack.org > Cc: cgroups@vger.kernel.org > Signed-off-by: Aristeu Rozanski There's lots of information in the oom log that is irrelevant depending on the context in which the oom condition occurred. Removing the stack trace would have made things like commit 9a185e5861e8 ("/proc/stat: convert to single_open_size()") harder to fix. In that case, we were calling the oom killer on large file reads from procfs when we could have easily have used vmalloc() instead. When you have a memcg oom kill, the state of the system's memory can usually be suppressed because it only occurred because a memcg hierarchy reached its limit and has nothing to do with the exhaustion of RAM. We already control oom output with global sysctls like vm.oom_dump_tasks and memcg tunables like memory.oom_verbose. I'm not sure that adding more and more tunables simply to control the oom killer output is in the best interest of either procfs or a long-term maintainable kernel. I can understand the usefulness of having a very small amount of output to the kernel log and then enabling tunables to investigate why oom kills are happening, but in many situations I've found to only have the oom killer output left behind in a kernel log and the situation is not on-going so I can't start diagnosing the problem if I don't know what triggered it. I think adding additional sysctls to control oom killer output is in the wrong direction. I do agree with removing anything that is irrelevant in all situations, however.